Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Comparison of fosmid libraries made from two geographic isolates of Caenorhabditis elegans Perkins, Jaryn Daniel 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_spring_perkins_jaryn.pdf [ 5.04MB ]
Metadata
JSON: 24-1.0071587.json
JSON-LD: 24-1.0071587-ld.json
RDF/XML (Pretty): 24-1.0071587-rdf.xml
RDF/JSON: 24-1.0071587-rdf.json
Turtle: 24-1.0071587-turtle.txt
N-Triples: 24-1.0071587-rdf-ntriples.txt
Original Record: 24-1.0071587-source.json
Full Text
24-1.0071587-fulltext.txt
Citation
24-1.0071587.ris

Full Text

COMPARISON OF FOSMID LIBRARIES MADE FROM TWO GEOGRAPHIC ISOLATES OF CAENORHABDITIS ELEGANS  by  Jaryn Daniel Perkins  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Zoology)  THE UNIVERSITY OF BRITISH COLUMBIA (VANCOUVER)  February 2011 © Jaryn Daniel Perkins, 2011  Abstract  To fill a need in the Caenorhabditis elegans community for genomic DNA held in manageably sized clones for complementation assays, a fosmid library was made from the N2 strain. These fosmid clones were aligned to the canonical sequence and cover 80% of the genome, but there were 396 gaps in contiguous coverage spread over the worm’s six chromosomes. In an attempt to fill in some of these gaps in the original fosmid clones’ sequence, we made another library from the Hawaiian geographic isolate CB4856. Our hope was that the divergence, inherent in the deletions containing 517 genes, between the two genomes would aid in the capture of previously gapped regions. This hope was justified. This thesis outlines the production and comparison of the two C. elegans fosmid libraries made from N2 and CB4856 and provides evidence that the way genomic libraries are made can affect the sequences packaged. Combining the two libraries, we now have a total coverage of 92.8% of genes and 90.43% of sequence in relation to the N2 canonical genome.  ii  Table of Contents  Abstract .......................................................................................... ii Table of Contents ........................................................................... iii List of Tables .................................................................................. vi List of Figures ................................................................................ vi List of Abbreviations ..................................................................... viii Acknowled gements .......................................................................... xi 1 Introduction .................................................................................. 1 1.1 Caenorhabditis elegans ............................................................. 1 1.2 Fosmids ................................................................................... 3 1.3 Hawaiian Strain ....................................................................... 5 1.4 Cloning Issues Leading to Gaps ................................................ 6 1.5 A New Fosmid Library ............................................................. 8 2 Methods ...................................................................................... 10 2.1 Fosmid Library Production .................................................... 10 2.1.1 Strains Used .................................................................. 10 2.1.2 Growth .......................................................................... 10 2.1.3 DNA Preparations ......................................................... 11 2.1.3.1 Phenol Purification ............................................. 12  iii  2.1.3.2 Purgene Kit ........................................................ 13 2.1.3.3 Further Purification ........................................... 13 2.1.4 Fosmid Preparation ....................................................... 14 2.2 Fosmid Sequencing................................................................. 15 2.3 Fosmid Mapping .................................................................... 15 2.4 Protein and Yeast Artificial Chromo some (YAC) Coverage of Libraries ............................................................................. 16 2.5 Repetitive Element Gap Co-occu rrence ................................... 17 2.6 Insertion and Deletion Determinations .................................... 18 2.7 Misalignment Analysis ........................................................... 18 3 Results ........................................................................................ 20 3.1 Construction and Description of a Fosmid Library for the Hawaiian Strain CB4856 ...................................................... 20 3.2 A Description of the N2 Library From WS210 Alignment ......... 23 3.3 Gaps in Fosmid Library Sequence........................................... 24 3.3.1 Gaps in the WRMHS Library From the WS210 Alignment .................................................................................. 27 3.3.2 Gaps in WRM06 Library From Both Alignments ............. 27 3.3.3 Gaps Determined After Combining the WRMHS and WRM06 Fosmid Libraries ............................................ 28 3.4 Protein Coding Genes Covered by the Fosmid Libraries .......... 33 3.5 Alignment of Libraries to YAC Sequence in Genome ............... 36 3.6 Known Insertion and Deletion Coverage From the Libraries .... 42  iv  3.7 A Comment on End Alignment Procedures .............................. 45 4 Discussion ................................................................................... 50 4.1 Analysis of Clones Produced in the Two Geographic Isolate Libraries ............................................................................. 52 4.2 Protein Coding Gene Coverage and Gaps ................................ 60 4.3 Correlati on of Gaps With YAC-derived Coding Sequences ....... 64 4.4 Alignment Difficulties ............................................................ 68 References ..................................................................................... 73 Appendices .................................................................................... 81 Appendix A: Gap Positions for Each Library ................................ 81 Appendix B: Examples of Unconventional Inserts ......................... 83  v  List of Tables  Table 1: Fosmid library coverage broken up by chromosome for WRM06 and WRMHS sepa rately and combined. ........................ 22 Table 2: Gaps displayed by the WRMHS and WRM06 librari es separately and combined, separated by chromosome. ................. 26 Table 3: Protein coding sequences falling in gapped regions of the libraries brok en up by chromosome........................................... 34 Table 4: Number of protein coding sequences, from the WS210 annotation, determined to be originally derived from Yeast cloning vecto rs and the total protein coding sequences not covered, by chromosome ........................................................... 37 Table 6: Insertions and deletions seen in the CB4856 Hawaiian strain, compared to canoni cal N2 sequence, by chromosome .................. 44 Table 7: Permutations calculated for the blast hits produced by paired ends in the unconventional subgroup determined to align to two chromosomes ............................................................................ 48 Table B1: Fosmids with alignments too large or small to be packaged by ! phage ............................................................................... 83  vi  List of Figures  Figure 1: Graphic depiction of the gap distribution across the chromosomes for the combined coverage of the WRM06 and the WRMHS libraries ..................................................................... 30 Figure 2: Venn diagrams depicting overlap of the WRMHS and WRM06 libraries.. .................................................................... 31 Figure 3: Representation of the five unconventional inserts and the rearrangements that could produce them ................................... 54 Figure A1: Graphic depicti on of the gap distribution across the chromosomes for the WRM06 library. ....................................... 81 Figure A2: Graphic depi ction of the gap distribution across the chromosomes for the WRMHS libra ry ....................................... 82  vii  List of Abbreviations  #  number  %  percent  !  lambda  µl  microlitre  °C  degrees Celsius  8P  8 times peptone media  BAC  bacterial artificial chromosome  Blast  basic local alignment search tool  bp  base pairs  CA  cytosine adenine  CGC  Caenorhabditis genetics center  CHCl3  chloroform  contig  contiguous overlapping DNA  CsCl  cesium chloride  DNA  deoxyribonucleic acid  EDTA  ethylenediaminetetraacetic acid  EtOH  ethanol  F1  first generation  GFP  green fluorescent protein  HAE  HEPES acetate EDTA buffer  HEPES  4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid  viii  hr  hour  I  chromosome I  i.e.  id est.  IGEPAL  tert-octylphenoxy poly(oxyethylene)ethanol  II  chromosome II  III  chromosome III  indels  insertions and deletions  IV  chromosome IV  kb  kilo base pairs  KCl  Potassium Chloride  kD  kilodalton  L1  Larval stage 1  M9  Minimal Media Salt Solution 9  mg  milligram  MgCl2  Magnesium Chloride  ml  millilitre  mm  millimeter  mM  millimolar  NaOAc  Sodium Acetate  ng/µl  nanograms per microlitre  NGM  Nematode Growth Media  pH  power of hydrogen  RNAi  ribonucleic acid interference  ix  rpm  revolutions per minute  SCODA  synchronus coefficient of drag alteration  SNP  single nucleotide polymorphism  TBE  tris borate EDTA buffer  TE  tris(hydroxymethyl)aminomethane hydrochloride EDTA buffer  TG  thymine guanine  Tris HCl  tris(hydroxymethyl)aminomethane hydrochloride  µl  microlitre  V  chromosome V  X  chromosome X  x  times  YAC  yeast artificial chromosome  x  Acknowledgements  I would like to thank past and present students Jason Maydan, Ryan Viveiros and Adam Warner for endless discussions and providing ideas to circumvent problems that arose. Special thanks are owed to Stephane Flibotte and Jeff Magnusson, whose guidance and bioinformatics prowess kept me from going blind staring at the reams of data necessary to complete this work. I would also like to thank Don Moerman who gave me the opportunity to work and study in his lab. Finally to Jenb, thank you for all of your support, compassion and understanding when I was most challenged.  xi  1 Introduction  1.1 Caenorhabditis elegans  The suitability of Caenorhabditis elegans as a model organism for studies in developmental biology and exploration of genetic interaction networks was first described by Sydney Brenner (1974). The hermaphroditic nematode he portrayed is capable of self-fertilization, or cross-fertilization with males formed by genetic nondisjunction. This fertilization plasticity provides for simple stock maintenance and the ability to construct strains with desired genotypes, or to perform complementation tests. The organisms 3.5 day lifecycle at 20°C and large brood size (250-300) enhance the benefit that their clear body wall and eggshell provides for the discernment of structural variation from the outside of the worm. The easy and inexpensive strain maintenance, including freezing organisms for long-term storage, was also attractive for laboratory research (Brenner 1974). The discovery of an invariant cell lineage both post-embryonically (Sulston and Horvitz 1977) and embryonically (Sulston et al. 1983) provided further utility. This detailed description of the cell lineage led directly to the discovery of programmed cell death (Ellis and Horvitz 1986) and makes possible genetic analysis of lineage commitment in development. Gene transfer technology (Fire and Waterston 1989; Mello et al. 1991) allowing extragenic  1  complementation analysis and the introduction of artificial constructs added to a growing list of experimental benefits the worm provides. The power that gene transfer technology furnished was later exploited by Martin Chalfie and colleagues to demonstrate the usefulness of green fluorescent protein (GFP) to monitor gene expression patterns in vivo (Chalfie et al. 1994).  An ongoing goal to create mutations in every gene within the worm is underway (reviewed in Moerman and Barstead 2008). Of the 20,000 expected genes, described in Moerman and Barstead 2008, 7,000 mutations in 5,500 genes have been produced. Mutations in over 4,000 genes were produced by the Barstead, Moerman and Mitani international knockout consortium. The production of an 11,984 member library of coding sequence in a transferable vector allowed for whole genome analysis of protein interactions (Li et al. 2004; Vaglio et al. 2003) and of expression studies (Huang et al. 2003; Luan et al. 2004). Finally, the discovery of RNAi (Fire et al. 1998), and the subsequent understanding that bacteria expressing the double stranded DNA could be fed to worms and interfere with protein expression (Timmons and Fire 1998), allowed the production of a simple genome wide knockdown library (Fraser et al. 2000). These resources have allowed C. elegans to be one of the models at the frontline of genomics research on multicellular organisms. These studies were all made possible because C. elegans was the first multicellular organism with a completed genome. The sequence provided a framework upon which they could be produced and relied on the production of a physical map (Coulson et al. 1986) using cosmids and Yeast Artificial Chromosomes (YACs) (Consortium 1998).  2  With these genomic resources available, as well as the growing number of reverse genetic mutants, a comprehensive library of genomic DNA would provide a valuable addition as a method for complementation analysis and a system for producing functional fusions with native promoter regions. A genomic library could find utility in allowing subcloning of regions of the genome that, due to size or repetitive elements, may be difficult to clone. The ideal library would be made of clones in a size that allows the containment of a majority of the regulatory elements necessary for native expression (Dolphin and Hope 2006). As the YAC and cosmid clones produced for the sequencing project can rearrange and have been lost, it became necessary to construct a library using a more stable vector.  1.2 Fosmids  Rather than relying on a library containing genomic DNA between 100 to 3,000 kb, such as Bacteria Artificial Chromosome (BAC), making a 40-kb insert clone library was considered more useful and practical. The larger insert YAC and BAC clones are not trivial to manipulate and due to the large number of genes associated with them often do not provide adequate genetic resolution (Bauchwitz and Costantini 1998; Giraldo and Montoliu 2001). As well, some inserts in these large vectors display instability (Neil et al. 1990; Song et al. 2001; Yokobata et al. 1991). We chose to make a fosmid rather than a cosmid library (Kim et al. 1992) to contain the 40-kb  3  inserts. Cosmid libraries have been excellent tools, but many clones within the libraries have been prone to rearrangements or excision, resulting in a loss of viability due to their large size (Yokobata et al. 1991). In order to reduce the occurrence of such rearrangements, the fosmid vector pCC1FOS from Epicentre was used to maintain clones at low copy number until induced (Kim et al. 1992). The use of fosmids as backbones allows for the maintenance of large pieces of DNA (around 40 kb) in limited number (1-5) per bacterial host.  In 2005, I produced a fosmid library that was end-sequenced and mapped to the current genome with five-fold coverage (Perkins et al., unpublished results). This library has been used by many laboratories to study individual genes and in several whole genome projects (Dolphin and Hope 2006; Tursun et al. 2009; Zhang et al. 2008). The fosmids are currently being used by the ModEncode project, headed by Robert Waterston at the University of Washington, to determine the genome wide binding sites for transcription factors within C. elegans (Celniker et al. 2009). The 2005 library encompasses 80% of the genome and 84% of the genes within 15,784 clones. A library containing complete genetic coverage and similarly sized inserts, allowing even tiling across the genome, would allow constructs with similar sizes and backbones to minimize possible spurious vector effect differences and allow more complete analysis of the genome. It was decided that using a genetic variant of N2 might allow sequences to be cloned which have been previously unattainable in large insert bacterial vectors. Regions, which have inhibited packaging or propagation, may  4  be sufficiently different in a divergent genome to allow production and replication of clones. The Hawaiian geographic isolate of C. elegans was chosen for this purpose.  1.3 Hawaiian Strain The Hawaiian isolate of C. elegans, CB4856, is the most divergent of all the geographic isolates from the canonical Bristol, N2. This divergence was determined initially by Single Nucleotide Polymorphism (SNP) analysis (Denver et al. 2003; Swan et al. 2002; Wicks et al. 2001) and later by copy number variation. CB4856 contains deletions in 517 genes compared to N2 (Maydan et al. 2007; Maydan et al. 2010). The divergence within the Hawaiian isolate has found utility, within the worm community, in the mapping of gene mutations using SNP variation between the N2 and CB4856 (Flibotte et al. 2009; Maydan et al. 2007; Swan et al. 2002; Wicks et al. 2001). This use has prompted the re-sequencing of the CB4856 genome using Solexa sequencing (Marra, Moerman and Waterston, unpublished). The divergence is so great between the two strains that a genetic incompatibility was noticed when crossing geographic isolates. The incompatibility is caused by a gene deletion in the Hawaiian isolate that causes a paternally associated embryonic arrest and lethality of a specific diplotype from a mating of the F1 generation (Seidel et al. 2008). This result implies a form of sex-linked gene silencing due to a persistent chromosomal or gene difference or alteration that allows for an abnormal pattern of inheritance.  5  Genetic incompatibility is found in all of the known geographic isolates. Most isolates are compatible with either N2 or CB4856; only one isolate, from Roxel, Germany (Seidel et al. 2008), can mate successfully with both the Bristol and Hawaiian strains. The incompatibility results from balancing selection and allows for the maintenance of differing haplotypes within mixed populations. The haplotypes may be lost due to genetic drift favouring maintenance of non-detrimental alleles within a hermaphroditic population. With the number of gene deletions residing in the Hawaiian isolate, it has been suggested that a better understanding of the differences in genomic architecture between strains may possibly provide clearer insight into the molecular mechanisms underlying this phenomenon (Seidel et al. 2008). This unique evolutionary system may provide an opportunity to explore how the genetic diversity within hermaphroditic populations are developed and maintained.  1.4 Cloning Issues Leading to Gaps  20 % of the C. elegans genome was determined to be unclonable in Escherichia coli with cosmids (Waterston and Sulston 1995). The remaining gaps could only be covered using large insert YACs (Consortium 1998). Unclonable DNA has been previously associated with the presence of repetitive elements, palindromic sequences, Z DNA forming sequences and/or methylated DNA causing deletion or rearrangements in bacterial hosts (Yokobata et al. 1991). The presence of kinkable elements in DNA, made up of TG and CA dinucleotide repeats at specific intervals,  6  can cause local unwinding of the double helix. The resultant single stranded DNA is more likely to be part of molecular repair excision (Mcnamara et al. 1990; Razin et al. 2001) and may also result in unclonable regions due to the inability to process the secondary structures produced.  The regions that form these unclonable elements do not necessarily account for some of the inserts that have been seen during paired-end sequencing. Unconventional inserts are those that appear to align with ends on two different chromosomes, aligned in the same direction, aligned in the opposite direction away from one another an aligned with insert sizes calculated too large or small for packaging by the Lambda phage packaging extract. Termed discordant (Tuzun et al. 2005) or invalid (Bashir et al. 2008; Volik et al. 2003) these unconventional inserts may show differences in chromosomal architecture between similar genomes (Blakesley et al. 2010; Flicek and Birney 2009; Tuzun et al. 2005; Volik et al. 2003). However, these larger chromosomal rearrangements may be less common than would account for the quantities of unconventional inserts seen in the N2 library and the difference may be due to the misalignment of sequence.  Repetitive elements in a genome, when cloned, are frequently associated with inserts, which do not align conventionally (Razin et al. 2001; Yokobata et al. 1991). These regions in general provide unique difficulties for sequencing technologies. Multiple repeats of single, di, or trinucleotides can produce polymerase slippage (Murray et al. 1993), or template switching (Odelberg et al. 1995) that may cause sequencing  7  artifacts in these areas. Repeated elements can also produce problems with alignment due to small variations in sequence. The variations can create difficulties discerning the elements from one another and can be problematic for positioning (Flicek and Birney 2009; Metzker 2010) even without considering the issues caused by missequencing. Without proper alignment, these regions cannot be tiled and gaps will occur in the contiguous sequence, leading to an incomplete genomic resource.  1.5 A New Fosmid Library  The production of the N2 library has garnered a lot of attention and with it inquiries and requests to produce more clones with the hopes of filling in the gapped regions. The Moerman lab has also received requests for the production of a Hawaiian strain library. With the understanding that repetitive sequence may lead to misalignment and unclonable regions within bacterial cells, we thought a new library might be warranted if the genome to be sequenced differed significantly from the N2 DNA used to produce the first fosmid set. The Hawaiian strain is divergent enough to possibly alter repetitive sequences within the genome thereby allowing some alignment not previously attainable. Even if this is not the case, the Hawaiian divergence (Maydan et al. 2010) is significant enough that the library could be a useful tool for exploring the architectural reasons for this genome difference. It could also provide a resource for individuals to further explore the genomic cause and effect of balancing selection, and its evolutionary role, as described previously. As well, the newly re-sequenced  8  Hawaiian DNA provides the technical ability to explore the CB4856 genome. It will possibly allow better resolution of cloned sequences that align unconventionally to N2. This option is not available for other isolates at this time.  The focus of my thesis is the production and analysis of the Hawaiian geographic isolate of C. elegans. The work was undertaken in the hopes that we could fill in some of the gaps found in the N2 library and possibly provide the community with a resource to explore genomic architecture and its evolutionary cause. We decided that a five-fold coverage, the same as produced for the N2 library, would create the most likely chance of complementation for a reasonable cost. The resulting WRMHS library was end sequenced and aligned to the 210th release of the Wormbase C. elegans genome WS210. The N2 fosmids were realigned to this release to provide a comparison for the CB4856 library and to determine the level of library overlap occurring between the sets of fosmids.  9  2 Methods  2.1 Fosmid Library Production  2.1.1 Strains Use d  Natural isolates sampled from Bristol, England (VC196, an N2 subculture received from the Caenorhabditis stock center) and Hawaii, U.S.A (CB4856, an HA-8 subculture isolated from a pineapple field in 1972 (Caenorhabditis genetics center website 2007) were used as the source of genetic material for this study.  2.1.2 Growth  Single animals were placed on a 60mm plate of Nematode Growth Media (NGM) growing a lawn of Escherichia coli OP50 and allowed to grow until the F1 generation was laden with eggs. These were washed from the plate, with M9 buffer (22mM KH2PO4, 43mM Na2HPO4, 86mM NaCl, 1mM MgSO4) containing 1% TritonX-100, and pelleted in a 15ml polypropylene tube. The pellet was treated with 5ml egg-prep buffer (20% household bleach, 5% 10N KOH, 75% ddH2O) followed by vigorous  10  shaking. The eggs were rinsed with M9 buffer three times when they were released from the carcasses. The washed egg pellet was re-suspended in 10 ml M9 buffer and distributed on to 40 plates (150mm), containing 8P media on which a lawn of "1666 E. coli was spread. These were allowed to grow until the F1 generation became gravid. The plates were washed into four 50 ml conical tubes with M9 buffer containing 1% TritonX-100. These worms were rinsed with M9 buffer a minimum of three times. The number of rinses varied with bacterial content and turbidity of supernatant. The pellets were then exposed to 25 ml of ice-cold egg-prep buffer and shaken vigorously until no visible worm carcasses were present. The egg-prep buffer was refreshed if the reaction went for more than eight minutes. The eggs were then washed 6 times in M9 buffer and left in 10 ml of M9 overnight, on a rotating platform, to hatch. The L1 worms were then washed 3 times with M9 buffer followed by one time in TE (10mM TrisHCl pH 8.0, 1mM EDTA) buffer. The packed pellet was then frozen (-80°C).  2.1.3 DNA Pre paratio ns  DNA was prepared in two ways and subsequently mixed equally for all downstream applications.  11  2.1.3.1 Phenol Purifica tion  Frozen pellets were placed on ice. Proteinase K lysis buffer (8.8 mM TrisHCl pH 8.3, 44 mM KCl, 22 mM MgCl2, 0.4% Tween-20, 0.4% IGEPAL, 300ug/ml Proteinase K all in ddH2O) was added to the pellets in a volume equal to the pellet. These were incubated at 60-65°C for 1-2 hr, until there were no visible worm carcasses in the lysate. These reactions were inverted every 15 minutes during incubation. The lysate was immediately extracted with Phenol: CHCl3: isoamyl alcohol (25:25:1). Equal volumes of lysate and phenol solution were inverted gently 10 times and placed in a centrifuge at 13,000 rpm for 5 minutes to separate the phases. The aqueous phase was removed and re-extracted. This was followed by one CHCl3: isoamyl alcohol (25:1) extraction of equal volume done similarly to the previous description. The aqueous portion was again removed and to it was added 0.1 volumes of NaOAc followed by 2 volumes of 100% EtOH. These reactions were placed at -20°C for a minimum of 30 minutes. The DNA was precipitated from solution by centrifugation at 13,000 rpm for a minimum of 30 minutes at 4°C. The pelleted DNA was washed with 70% EtOH. The supernatant was removed and the precipitate was dried for 5 minutes at room temperature open to the air. The DNA was then resuspended in 500µl of TE buffer (10mM tris-HCl pH8.0, 1mM EDTA) containing 1 mg of RNase A and incubated at 37°C for one hour, inverting the tubes 10 times every 15 minutes. These reactions were re-extracted and precipitated with the procedure listed above. The DNA concentration was checked and the final volume made up to maintain a concentration of 500ng/µl.  12  2.1.3.2 Purgene Kit  Pellets not treated to phenol purifications were treated with the Puregene tissue extraction kit from Gentra according to the manufacturer’s instructions. The final DNA was diluted to a concentration of 500ng/µl in TE buffer.  2.1.3.3 Further Purificatio n  The combined DNA was further purified in two ways. The first aliquot was treated to isopycnic centrifugation in a CsCl gradient. This was followed by butanol extraction to remove DNA bound dye and buffer exchange (using amicon ultra centrifugal filter with a 10kD cutoff and microcon YM-30 centrifugal concentrators) to remove CsCl and concentrate the DNA solution. Another aliquot of DNA was purified further using synchronous coefficient of drag alteration (SCODA), a method employing differential electrical fields to separate DNA from other molecules in a matrix, (Pel et al. 2009) with electrophoretic washing.  13  2.1.4 Fosmid Pre para tion  Fosmids were produced using the Epicenter CopyControl Fosmid Library Production Kit. DNA was mechanically sheared to 25-50 kb using a 50 µl Hamilton syringe. The DNA was end repaired, with Epicenters End-It enzyme, to ensure blunt ends. The DNA was sized using pulsed field agarose gel electrophoresis (PFGE). The CsCl purified DNA was sized on 1X HAE (10mM 4-(2-hydroxyethyl)-1piperazineethanesulfonic acid (HEPES), 10 mM NaOAc, 0.5mM EDTA, made up and pH to 7.4 with NaOH) buffer (Kinscherf et al. 2009). The SCODAphoresis purified sample was sized in 0.5 X TBE (45 mM Tris base, 45 mM Boric acid, 2 mM EDTA pH 8.0). In both cases, DNA with a minimum size of 25 kb and a maximum of 42 kb (as compared to DNA markers) was cut from the agarose gel. The agarose plug was digested with agarase and the DNA was buffer exchanged using TE and concentrated using the Amicon and Microcon centrifugal filters described previously. These sized fragments were ligated to the pCC2Fos vector and packaged into a ! phage packaging extract, with a titre of 150000 clones/ml, which was used to infect EPI300-T1R E. coli. Individual clones were plated on agarose and 15,360 colonies were picked and plated in a 384-well format and sequenced from both ends.  14  2.2 Fosmid Sequencing  Fosmids were grown in culture over night by the Canada’s Michael Smith Genome Sciences Center. These were then spun down and DNA was purified from bacteria using Qiagen spin columns in 96-well formats. The DNA was sequenced using the ABI 3100 and ABI 3730 capillary sequencers with the M13 forward and reverse primers. The sequence was trimmed for vector DNA.  2.3 Fosmid Mapping  The end sequences, of the CB4856 strain made for the work done in this thesis and N2 strain made in 2005, were BLAST aligned to the N2 reference genome version WS210 (Wormbase WS210 2010). The best blast matches were selected, based on bitscore and query length, for each input. These individual ends were paired to their respective alternate side. Single sides, with no pair, were discarded. All of the clones discarded from further analysis beyond this point are believed to have unconventional alignments. The orientations of the inputs were determined. Those fosmids whose inserts had two ends aligned to different chromosomes, or found on the same strand were discarded. The inserts aligning with ends facing opposite directions away from one another were also discarded. The clones were then organized by size and those smaller than 15 kb and larger than 55 kb were discarded, as packaging by the ! extract is unlikely outside of this range due to size restrictions imposed by the capsid head.  15  The remaining clones were ordered and analyzed for completeness of genomic sequence. Both libraries were treated similarly to simplify comparison.  Gaps were determined by looking at the areas of non-contiguous clones. Both libraries were tiled together to determine the holes in the overlapping coverage for the combined fosmid sets. The total sequence not contained within the fosmid library was calculated from the combined non-contiguous regions within each library.  2.4 Protein and Yeast Artificial Chromosome (YAC) Coverage of Libraries  A list of all genes coding for proteins was pulled from Wormbase in the genomic data freeze WS210; a permanently frozen database available from Wormbase. The gaps in contiguous sequence were compared to the coordinates of each gene within the chromosome in question. Genes falling within the gaps were considered to be missing from the library. Total coverage was the difference between the missing genes and the total coding regions in the genome. Similarly, the Yeast Artificial Chromosome (YAC) coverage was determined by selecting those gene designations originally derived from YAC sources.  A "2 distribution was calculated for the two libraries protein coding gene coverage. For it a two by two contingency table was used to separate the Yeast and cosmid  16  derived genes contained in the WRMHS and WRM06 libraries independently. The c2 value was compared to a 99.999% confidence interval with 1 degree of freedom.  2.5 Repetitive Element Gap Co-occurrence  The repetitive element gap co-occurrence was carried out in two ways. Determining the Yeast derived clone sequence co-occurrence with repetitive elements required static addresses to allow for the variable overlap seen with cosmids and other near by YACs. The protein coding sequences were used as the static addresses. To increase the likelihood a fair representation of the surrounding sequence was taken, 40 kb upstream and downstream from each of the protein-coding sequences was used to compare to the lists of repetitive elements. This was chosen to place the outer limit at the size of a fosmid insert away from the protein-coding region. The YAC coding regions were compared to the non-YAC derived protein coding regions for repetitive element distribution. For the gapped regions of the libraries, the gap endpoints were used as static positions to compare to the list of repetitive elements. These positions were compared to a theoretical similar sized region defined as if the elements were equally distributed throughout the genome.  17  2.6 Insertion and Deletion Determinations  The known insertions and deletions held in the VC196 laboratory N2 (Maydan et al. 2007) strain and the CB4856 Hawaiian geographic isolate (Maydan et al. 2010) were compared to the list of WRMHS library held inserts. The clones in the WRMHS library containing the known insertions and deletions were quantified and those complementing the deletion in our N2 strain are presented here and described in further detail.  2.7 Misalignment Analysis  After fosmids were mapped to the genome and separated if inserts were unconventional, those clones having end sequences aligning to two different chromosomes were looked at more closely. A PERL program designed by me, and written by Jeff Magnusson, was used to explore the alignments. The multiple blast outputs were categorized for the subgroup discarded due to alignment of paired ends on separate chromosomes. Those pairs of end alignments, from each clone, determined to be on the same chromosome were output in permutations allowing for conventional insertion only (i.e. those which are facing in opposite directions towards one another). The sets of pairs, which fell into this class, were calculated based on subject length and only those falling between 10 kb and 60 kb were outputted. The output was  18  quantified and separated by chromosome and by presence or absence of pairs aligning unconventionally (described previously in chapter 1.4).  19  3 Results  3.1 Construction and Description of a Fosmid Library for the Hawaiian Strain CB4856  I constructed the CB4856 library as was described in chapter 2. It was labeled WRMHS. The WRM designation ties it to the original N2 library made with the HS classification due to its use of DNA from the Hawaiian geographic isolate. Sequencing of 15,360 clones, performed at the Michael Smith Genome Sciences centre, produced 28,630 end sequences with 2,090 ends missing due to low quality or missing sequence. I Blast aligned the supplied sequences to the WS210 freeze. I found that 1,274 of the sequences did not have a matching pair. Other paired end sequences not used included 237 clones with ends facing the same direction, 479 with ends paired in opposite directions (facing away from each other) and 436 with ends aligned with two different chromosomes. 935 clones were calculated to have insert sizes larger than 55 kb or smaller than 15 kb, based on alignment. This left 11,358 clones for further analysis (Table 1). These clones comprising the CB4856 library covered 77.5 % of chromosome I in 1,498 clones, 88% of chromosome II in 1,879 clones, 78% of chromosome III in 1,445 clones, 83% of chromosome IV in 1755 clones, 87% of  20  chromosome V in 2459 clones and 96% of the X chromosome in 2322 clones. In total, 85% of the genome is covered by the library.  The library coverage can be seen in further detail in Table 1. The table shows a striking difference between the CB4856 and the N2 library. The WRMHS library shows fewer clones with a greater mean size. The theoretical and actual coverage are both greater for almost all chromosomes in WRMHS. Coverage was calculated as the quotient of the total sequence in each library with fosmid coverage for the actual value. Theoretical coverage was calculated as the quotient of the total sequence in each library and the total sequence in each chromosome. The exception to the theoretical and actual coverage being better in WRMHS is Chromosome II in which the percentage of sequence covered is still higher in the Hawaiian library. Only Chromosome X shows better sequence coverage as well as theoretical and actual coverage in the N2 library. Sequence coverage refers to the sum of genome regions covered by the library. The standard deviation of means calculated for the libraries show that the distribution of fosmid sizes is greater in the WRMHS library for almost all chromosomes and suggests a cleaner size selection during production.  21  Table 1: Fosmid library coverage broken up by chromosome for WRM06 and WRMHS separately and combined.  WRM06  Breakdown (#Clones)  Mean Size (bp)  Standard Deviation (bp)  Total Sequence in Libray (bp)  I II III IV V X  1,417 1,996 1,351 1,687 2,708 3,330  33,378 33,923 33,640 33,676 33,714 34,035  3,994 3,711 3,820 3,837 3,750 3,753  47,295,973 67,710,646 45,413,564 56,845,804 91,298,681 113,335,132  Total  12,489  WRMHS  Breakdown (#Clones)  33,782 Mean Size (bp)  3,797 Standard Deviation (bp)  421,899,800 Total Sequence in Libray (bp)  Theoretical *Coverage (X) 3.14 4.43 3.29 3.25 4.36 6.40 4.21 Theoretical *Coverage (X)  I II III IV V X  1,498 1,879 1,445 1,755 2,459 2,322  34,162 34,691 34,674 34,178 34,545 34,614  4,579 4,823 4,265 4,666 4,872 4,534  51,141,145 65,218,492 50,138,428 60,051,468 84,946,569 80,407,857  Total  11,358  Combined  Breakdown (#Clones)  34,495 Mean Size (bp)  4,653 Standard Deviation (bp)  391,789,540 Total Sequence in Libray (bp)  3.91 Theoretical *Coverage (X)  33,778 34,298 34,176 33,937 34,110 34,273  4,323 4,301 4,083 4,283 4,340 4,102  98,462,716 132,903,540 95,556,673 116,810,335 176,245,250 193,710,826  6.53 8.70 6.93 6.68 8.42 10.93  34,121  4,241  813,689,340  I II III IV V X Total  2,915 3,875 2,796 3,442 5,167 5,652 23,847  3.39 4.27 3.64 3.43 4.06 4.54  8.11  ±Actual Coverage (X)  Sequence Coverage (bp)  % of bp Covered  Chromosome Size (bp)  4.37 5.46 4.74 4.21 5.16 6.57  10,829,403 12,409,885 9,587,372 13,493,564 17,705,635 17,253,419  58.16 81.22 69.56 77.13 84.62 97.32  15,072,421 15,279,324 13,783,685 17,493,793 20,924,143 17,718,854  5.19 ±Actual Coverage (X)  81,279,278 Sequence Coverage (bp)  78.92  100,272,220  % of bp Covered  Chromosome Size (bp)  4.38 4.86 4.64 4.14 4.69 4.72  11,677,217 13,407,032 10,812,288 14,521,499 18,099,563 17,032,937  77.47 87.75 78.44 83.01 86.50 96.13  15,072,421 15,279,324 13,783,685 17,493,793 20,924,143 17,718,854  4.58 ±Actual Coverage (X)  85,550,536 Sequence Coverage (bp)  85.32  100,272,220  % of bp Covered  Chromosome Size (bp)  7.85 9.39 8.40 7.47 9.06 11.06  12,542,308 14,149,826 11,379,949 15,636,289 19,448,251 17,519,113  83.21 92.61 82.56 89.38 92.95 98.87  15,072,421 15,279,324 13,783,685 17,493,793 20,924,143 17,718,854  8.97  90,675,736  90.43  100,272,220  *Theoretical coverage of the library relating to the number of times any one sequence would be covered by a fosmid in the genome ±actual coverage of the library relating to the number of times the sequence held in fosmids would be covered.  22  3.2 A Description of the N2 Library From WS210 Alignment  Previously, I made a Bristol N2 fosmid library. It was produced in a similar fashion to the Hawaiian library. The major difference, in packaging, between the libraries is the vector backbone. In the N2 library, the 8.1 kb pCC1Fos vector, from Epicenter, was used. This was completed four years ago. It was labeled WRM06 and the initial alignment of the library was made to the WS140 genomic data freeze (Wormbase WS140 2005); a permanently frozen database available from Wormbase. For this study, the WRM06 fosmids were realigned to the WS210 data freeze (Wormbase WS210 2010) enabling comparison between the two geographic isolate libraries.  The alignment of the Bristol N2 library to the WS210 genome showed a similar pattern of fosmids to the WS140 alignment. From the initial 15,744 fosmids, 653 clones did not align or provided imperfect sequence. 501 clones had only one arm align, while 548 clones were aligned displaying arms on two different chromosomes. 140 clones had arms aligned in the same direction and 294 clones aligned with arms facing opposite directions, away from one another. These clones were separated for ease of analysis. The remaining 13,568 clones, all in proper orientation according to the reference genome, were used to construct a revised fosmid map for the N2 isolate. The inserts’ sizes were calculated according to their paired end alignment. Those  23  below 15 kb and above 55 kb were removed leaving 12,489 fosmids to tile onto the genome.  The N2 clones were mapped to the chromosomes as follows. Chromosome I has 1,417 total clones covering 58 % of the chromosome sequence. Chromosome II has 1,996 fosmids containing 81% of sequence, while chromosome III has 1,351 clones accounting for 70% of the chromosome. Chromosome IV has 1,687 fosmids aligning to 77% of the chromosome and chromosome V has 2,708 clones containing 85% of the chromosome sequence. The X chromosome has 3,330 fosmids accounting for 97% of sequence.  3.3 Gaps in Fosmid Library Sequence  The WRM06 and WRMHS libraries were sequenced to a depth calculated to provide five fold coverage of the genome, based on the size of fosmid inserts. As can be seen in Table 2, complete coverage for neither genome was obtained. This is also evident in the difference between the actual and theoretical coverage of the library. WRM06 displays lower than expected coverage for both calculations for all chromosomes except for X. For this chromosome, there was a slight discrepancy between its actual and theoretical value (6.57x and 6.40x, respectively) and areas of non-coverage within the chromosome were evident. The coverage difference displayed between X and all other chromosomes points to the unequal allocation of fosmids within this library  24  compared to that produced from the Hawaiian strain. The following sections describe the gaps observed in the coverage of fosmid contiguous sequence.  25  Table 2: Gaps displayed by the WRMHS and WRM06 libraries separately and combined, separated by chromosome.  Number of gaps  Total gap size (bp)  Mean Gap (bp)  Standard Deviation  % of gaps  % of genome  Chromosome size (bp)  I II III IV V X  74 72 59 102 77 25  4243018 2869439 4196313 4000229 3218508 465435  57338 39853 71124 39218 41799 18617  63441 49298 81446 52364 57937 33613  18.09 17.60 14.43 24.94 18.83 6.11  28.15 18.78 30.44 22.87 15.38 2.63  15072421 15279324 13783685 17493793 20924143 17718854  Total  409  18992942  60319  18.94  100272220  WRMHS  Number of gaps  Total gap size (bp)  46438 Mean Gap (bp)  WRM06  I II III IV V X  75 74 65 97 95 48  3395204 1872292 2971397 2972285 2824580 685917  45269 25301 45714 30642 29732 14290  Total  454  14721675  Number of gaps  Total gap size (bp)  32427 Mean Gap (bp)  Combined  Standard Deviation  % of gaps  % of genome  Chromosome size (bp)  54102 33529 48036 33734 39785 23543  16.52 16.30 14.32 21.37 20.93 10.57  22.53 12.25 21.56 16.99 13.50 3.87  15072421 15279324 13783685 17493793 20924143 17718854  14.68  100272220  41357 Standard Deviation  % of gaps  % of genome  Chromosome size (bp)  20.75 17.35 18.37 22.11 18.71 2.72  16.79 7.39 17.44 10.62 7.05 1.13  15072421 15279324 13783685 17493793 20924143 17718854  I II III IV V X  61 51 54 65 55 8  2530113 1129498 2403736 1857504 1475892 199741  41477 22147 44514 28577 26834 24968  40983 20751 42368 30552 29044 51658  Total  294  9596484  32641  35189  9.57  100272220  26  3.3.1 Gaps in the WRMH S Li brary fro m the W S210 Alignment  The 11,358 clones of WRMHS library were tiled on the WS210 genomic data freeze. The total breakdown is detailed in Table 2. From the tiling, 75 gaps were found in the contiguous sequence on chromosome I amounting to 3,395,204 bp missing. Chromosome II was missing 1,872,292 bp in 74 gaps. III and IV had 2,971,397 bp in 65 gaps and 2,972,285 bp in 97 gaps, respectively. V had 2,824,580 bp unaccounted for in 95 gaps and X was missing 685,917 bp in 45 gaps. The total sequence covered by this library is 85.3% of the genome with 85,550,545 bp included (displayed in appendix A).  3.3.2 Gaps in W RM06 Li brary fro m Bo th Alignments  The initial alignment of the library to the WS140 genome showed 73 gaps, in contiguous sequence, on Chromosome I accounting for 4,227,363 missing base pairs. On Chromosome II 68 gaps in the library removed 2,850,021 bp. Chromosome III coverage included 57 gaps and 4,133,578 bp missing. Chromosome IV fosmids were missing 100 gaps including 3,887,566 bp of sequence. Chromosomes V and X showed 76 gaps and 3,200,103 bp and 22 gaps including 479,779 bp missing from the library, respectively.  27  The 12,489 fosmids, which were conventionally aligned to the WS210 genome, and were within the proper size range, showed a similar pattern of contiguous sequence to the WS140 alignment. The WRM06 fosmids left 409 gaps in contiguous sequence, when tiled on the WS210 genome. The gapped sequence can be compared to WRMHS in Table 2. The coverage of the library was 81.1% of the genome with 81,279,278 bp included. The gaps seen in the library coverage are split between the chromosomes with 75 gaps of mean size 84,084 bp found on I, 72 with mean size 39,853 bp on II, 59 with mean size 71,124 bp on III, 102 with mean size 39 218 bp on IV, 77 with mean size 41,799 bp on V and 18 with mean size 18,617 bp on X (displayed in appendix A).  3.3.3 Gaps Dete r mined a fter Co mbi ning the W RMH S and WRM06 Fosmid Librar ie s  By combining the libraries, complementing areas within gapped sequence were made visible. Even though the libraries had similar numbers of gaps, they were not placed in identical locations. After combining the libraries, the number of total gaps in contiguous coverage dropped to 294 from 409 seen in the N2 library and 454 detected in the CB4856 library. These gaps are spread across all the chromosomes with 2,530,113 bp missing in 65 gaps on chromosome I, 1,129,498 bp missing in 51 gaps on chromosome II, 57 gaps missing in 4 133 578 bp chromosome III 2,403,736 bp missing bases with 54 gaps on chromosomes IV, and 1,857,504 bp in 55 gaps on  28  chromosome V. Of interest is the fact the X chromosome is only missing 199,741 bp spread over eight gaps.  Table 2 shows breaks in contiguous sequence found in the N2 and CB4856 libraries, both separately and amalgamated. Production of the WRMHS library has clearly improved the genomic coverage. This can be seen when the alignments for the two libraries are compared side by side, separately and combined. The combined library shows fewer gaps with less sequence missing for each chromosome than either WRMHS, or WRM06 separately. The layout of gaps over each chromosome can be seen in Figure 1. The independent libraries can be viewed for comparison in appendix a. The gaps found in the combined libraries are clustered in the arms of each chromosome, with larger holes found at the ends. Almost none are seen on the X chromosome. The standard deviation and mean size of gaps are also smaller for nearly every chromosome. This is seen in Figure 2. The combined libraries show a small decrease in mean gap size from the WRMHS library as a whole, even though the only chromosome displaying a decrease in mean is X. The libraries do not completely overlap, however, and the combined libraries still leave 9.6% of the genome with no fosmid coverage.  29  Figure 1:  Graphic depiction of the gap distribution across the chromosomes for the combined coverage of the WRM06 and the WRMHS libraries. Top left in red shows chromosome I; top right in coral is chromosome II; middle left in yellow is chromosome II; middle right in green is chromosome IV; bottom left is chromosome V; bottom right in purple is chromosome X. X axis displays position of the gap on the chromosome. Y axis shows size of each gap.  30  Figure 2:  Venn diagrams depicting overlap of the WRMHS and WRM06 libraries. The mean size (green) and standard deviation (blue) of gaps in the combined libraries is also presented.  31  Table 2 shows the trends in the contiguous sequence gaps, for each library. It also shows the WRMHS library is generally better on all measured levels described. The one exception is the number of gaps. The Hawaiian library shows more gaps, however, the mean size of the gaps is smaller for each chromosome and the total gapped area is smaller for every chromosome except for X. The gaps on X also account for the biggest discrepancy in the number of gaps. The distribution of the gaps across the chromosomes is more even and the standard deviation of gap sizes is smaller for the WRMHS library.  The number of gaps seen in both of the libraries may be somewhat misleading. Closer inspection of the unconventional clones show that there may be inserts which are associated with some of the gapped regions. These are most readily seen in the fosmids determined to be too large for encapsulation by the ! phage packaging extract (Feiss et al. 1977). There are several instances with multiple clones (Tuzun et al. 2005) having ends align on either side of a gap in contiguous sequence. The fosmids, which are too large, are likely not the only subcategory of unconventional inserts to affect the gaps layout.  32  3.4 Protein Coding Genes Covered by the Fosmid Libraries  Based on the WS210 data freeze the WRMHS library showed fewer missing genes than did the WRM06 library and was consistent for all chromosomes except for X and V. The best coverage of coding regions was on the X chromosomes with 96.9 % and 95.0% coverage for the N2 and CB4856 libraries, respectively. The other chromosomes did not show as complete coverage with the poorest examples being III with 76.0% coverage for the N2 library and I with 82.0% coverage for CB4856. The overall coding sequence coverage contains 85.0% of all genes for N2 and 86.8% for CB4856 when the libraries are not combined. Table 3 displays the quantity of protein coding sequence missing from the libraries separately and combined.  33  Table 3: Protein coding sequences falling in gapped regions of the libraries broken up by chromosome Total Genes  WRM06  % of Total  WRMHS  % of Total  Combined  % of Total  I II III IV V X  3154 3805 2914 3735 5800 3213  707 580 712 695 736 99  22.4% 15.2% 24.4% 18.6% 12.7% 3.1%  587 454 494 571 723 165  18.6% 11.9% 17.0% 15.3% 12.5% 5.1%  389 213 373 300 312 49  12.3% 5.6% 12.8% 8.0% 5.4% 1.5%  Total  22621  3529  15.6%  2986  13.2%  1636  7.2%  34  The combined coverage of the libraries overlaps in coding region gaps to cover half the number of genes missing from either library individually. The result is that of 3,392 and 2,986 genes missing from the N2 and CB4856 libraries, respectively, only 1,636 of the 22,621 genes present in the WS210 data freeze are still missing (7.2%), when the libraries are combined. Table 3 shows that the overlap is significant on each chromosome, but not complete.  Producing a CB4856 library had additional benefits. The overlap of coverage in previously tiled areas is more useful than initially expected. Genes that are not fully contained within a single fosmid or cosmid in the two N2 libraries are represented within the Hawaiian Library. One example of this is the unc-119 gene. When originally sequenced this gene was not contained within a single contig, but split between two partially complementary cosmids. This gene is also not present in the WRM06 library and resides in a single gap on chromosome I. However, the WRMHS library has two fosmids containing the region of interest with at least 5 kb of upstream sequence. This is likely a result of the increase in depth of coverage and may therefore be observed for other genes.  35  3.5 Alignment of Libraries to YAC Sequence in Genome  To see if the fosmid libraries’ cloning patterns are consistent with the original cosmid library, I compared their gaps in contiguous sequence to those regions sequenced only with yeast clones. The YACs were used to complete regions believed to be unclonable in bacteria during the sequencing project. I accomplished this by comparing the sequence unaccounted for in the fosmid libraries with the cosmid designation originally given during sequencing of the genome to determine vector origin. Those cosmids originally designated with the first letter Y were derived from YACs.  Table 4 shows the proportion of coding regions originally sequenced in YACs to the total coding regions falling inside gaps for each library independently and combined. It shows there is a marked increase in the proportion of the YAC-derived coding sequences lying within gaps in the libraries compared to total coding genes. The gaps in fosmid coverage show 62% and 58% YAC-derived genes for the WRM06 and WRMHS libraries, respectively. This is increased even further to 81 % for the gaps remaining after the two libraries have been combined. The co-occurrence of gaps in the fosmid libraries with YAC derived genes, representing sequence unclonable in the original cosmid library, provides evidence of the preferential exclusion of these regions from the libraries.  36  Table 4: Number of protein coding sequences, from the WS210 annotation, determined to be originally derived from Yeast cloning vectors and the total protein coding sequences not covered, by chromosome  I II III IV V X Total  WRM0 6 Yeast derived  WRM06 Total protein coding  453 307 502 492 389 47 2,190  707 580 712 695 736 99 3,529  % of % of % of % of genes WRMHS genes Combined genes Total genes WRMHS Combined Yeast Total total vectors from from from from Yeast Yeast derived yeast protein yeast protein yeast protein yeast derived derived vectors coding coding coding derived derived derived derived sequence sequence sequence sequence 64.1% 415 587 70.7% 343 389 88.2% 695 3,154 22.0% 52.9% 181 454 39.9% 160 213 75.1% 522 3,805 13.7% 70.5% 371 494 75.1% 332 373 89.0% 691 2,914 23.7% 70.8% 411 571 72.0% 265 300 88.3% 1039 3,735 27.82% 52.9% 322 723 44.5% 206 312 66.0% 861 5,800 14.8% 47% 48 165 29.1% 22 49 45% 158 3,213 4.92% 62.06% 1,748 2,994 58.4% 1,328 1,636 81.17% 3,966 22,621 17.53%  37  The alignment of the libraries’ gapped regions with sequence cloned originally only in Yeast could suggest a mechanism inhibiting these regions from being packaged or propagated in bacteria. As repetitive sequence may cause both misalignment and challenges for packaging, I wanted to check whether there was a co-occurrence between these elements and the original cosmid library’s unclonable sequence. Looking at the regions only captured in YACs and comparing them to gaps found in the recent fosmid libraries might provide insight into the reasons these regions could not be cloned or propagated.  By comparing the percentage of repetitive elements falling in Yeast derived clones to the percentage of protein coding sequences in YACs, I came up with a proportion value representative of quantity of elements held in yeast derived sequence as a proportion of the percentage of protein coding sequences. By definition this assumes every gene is associated with the same number of repetitive elements and any deviation for a subgroup of elements will show up as a factor of 1. Table 5 shows that the YAC derived sequence contains between 1.44x and 1.94x (chromosomes IV and II, respectively) the number of repeat elements per unit length that would be expected if regions were evenly distributed. For all Yeast derived sequences in the genome there is 1.85x the repetitive elements, which would be expected for the same number of coding elements contained anywhere in the genome. These values do not reflect the complete lack of regions, around cosmid-cloned genes, with an increased proportion of repetitive elements. However, they do show the regions are more frequent in YAC 38  derived sequence, with 79.5% of regions around genes displaying an increased amount of repetitive elements and only 29.8% of cosmid derived sequence showing a similar pattern.  39  Table 5: Proportion of repetitive elements that would be expected for a similar sized area, if elements were distributed evenly over the genome  YAC  WRMHS  WRM06  Combined  (x)  (x)  (x)  (x)  I  1.81  1.91  1.84  2.23  II  1.94  1.80  2.02  2.35  III  1.86  1.88  1.77  2.02  IV  1.44  1.75  1.82  2.16  V  1.82  1.86  2.16  2.55  X  1.71  2.13  2.60  2.91  Total  1.85  2.04  2.11  2.55  40  The increased density of repetitive elements in these stretches believed to be unclonable, brought forth the question of whether the gapped regions of the fosmid libraries contained a similar concentration of repeat sequence proportional to that seen in YAC derived regions for the original cosmid library. To explore the abundance of repetitive elements over the gapped regions, the quotient of the proportion of elements falling into the gaps and the percentage of genome not contained within fosmids was calculated. By definition this will assume every region is associated with the same number of repetitive elements and any deviation for a subgroup of regions’ elements will present as a factor of 1. Table 5 shows the side-by-side comparison of the Yeast derived sequence to both the WRM06 library and WRMHS library as well as the combined libraries. A strong trend towards increasing quantities of repetitive elements was seen from the YAC derived sequences to the WRMHS (2.04x) and the WRM06 (2.11x) libraries. The gaps seen in the libraries show less repetitive elements for the WRMHS library as a whole compared to the WRM06. This is true for most chromosomes except for I and III. The gaps remaining after the fosmid libraries are combined showed 2.55x the repetitive elements expected for the same length of DNA.  To test whether the distribution of genes produced in the Hawaiian library would be likely if the Bristol fosmids were sequenced to a depth equal to the combined libraries, a Pearson chi squared test was performed. It was designed with a null hypothesis that there is no relationship between the two libraries’ differences and the distribution of 41  YAC or cosmid vector derived protein-coding genes displayed. The critical value was determined to be 33.97, which is above a value necessary to achieve a confidence interval of 99.999%. The null hypothesis can therefore be rejected providing support for the alternate hypothesis that states the ratio of YAC to cosmid derived genes is dependant on the two fosmid libraries’ differences.  3.6 Known Insertion and Deletion Coverage of the Libraries  As stated previously, the Hawaiian isolate CB4856 is known to contain DNA sequence insertions and deletions (indels) relative to the N2 strain. To determine if clones were produced covering these indels, I compared the list of clones from the CB4856 library WRMHS to the list of indels described (Maydan et al. 2010). Table 6 shows the breakdown of the search. 429 fosmids span the regions in which 289 genes associated with 143 of the 181 CB4859 indels are found. In the 38 remaining indels, 334 genes are contained. The indels from the entire list show a mean size of 7,343 bp. The 38 indels from the list of non-covered genes have a mean size of 71,568 bp. The large size of the indels may have contributed to the underrepresentation of these particular sequences in the fosmid library, especially due to the majority being deleted sequence and larger clones being dismissed (see Discussion). 42  To explore the possibility of clones containing the larger deletions being overlooked due to size, the clones aligning to regions too large for packaging were investigated. 10 of the 38 indels have 34 individual fosmids showing inserts calculated to be larger than the 55kb cutoff and flanking the individual indels’ breakpoints. A prime example of two fosmids aligning with indels as well as gaps can be seen in appendix B labeled in red (page 104). The end sequences of these clones cover a region with a 4,500 bp deletion of coding sequence, as well as a gap in contiguous sequence. All of the indels in the CB4856 genome, with the exception of eight, are associated with fosmid ends aligning in unconventional ways. The unconventional inserts may have ends found on different chromosomes, or ends found on the same strand and pointing in the same direction. Other unconventional clones have ends mapping to opposite strands and pointing away from one another or were those found to be out of the selected size range.  43  Table 6: Insertions and deletions seen in the CB4856 Hawaiian strain, compared to canonical N2 sequence, by chromosome  # genes #genes in in indels indels uncovered  #genes in indels covered  Total indels in CB4856 (#)  #indels covered by fosmids  #fosmids covering indels  I II III IV V X  28 217 21 70 279 8  9 95 1 56 168 5  19 122 20 14 111 3  20 58 15 16 68 4  14 51 12 10 54 2  39 161 26 25 180 8  Total  623  334  289  181  143  439  44  The lab strain VC196 used to produce the Bristol isolate fosmid library is also known to carry a deletion in a single gene (Maydan et al. 2007). This is a 1,788-bp deletion affecting exons 5 and 6 of alh-2. This gene is found on chromosome V in the 3,352 kb area from position 1,644,378 to 1,647,729. Two fosmids produced from the Hawaiian strain complement this region. Both WRMHS24L15 and WRMHS02O10 contain the coding region as well as at least 5kb upstream and downstream from the start and stop coordinates.  3.7 A Comment on End Alignment Procedures  The fosmid libraries produced were aligned to the WS210 data freeze of the C. elegans genome with BLAST alignment. BLAST alignment will forego precision for speed, providing a list of possible matches to the sequences given. The BLAST alignment performed provided a list of output matches of 1,398,885 individual blast hits, to the query search numbering 28,630 ends. To streamline the process the best hits were chosen according to length of alignment and the bit score, an algorithm describing the quality of alignments, with better scores created by increased length and fewer gaps or mismatches. These best hits were aligned as pairs and further manipulations were made with the assumption that these pairs were true pairs.  45  Using pairs of end sequences to align ends on the genome allows for identification of unconventional inserts. Unconventional inserts are those seen aligning with both ends pointing in the same direction, with both ends pointing in opposite directions away from each other, to two different chromosomes, or with sizes that could not be packaged within the ! phage extract. All of these are seen within both the WRMHS and WRM06 libraries. These problem inserts may arise due to repetitive sequence alignment and due to misalignment by the blast algorithm, placing higher scores for unlikely candidates.  To determine if this misalignment was the cause of some of the unconventional inserts found in the WRMHS library, several of these inserts were analyzed to examine the permutations of blast hits produced to see if there are other more likely candidates. A PERL program was designed with the help of Jeff Magnusson to list all hits where complementary ends were found on the same chromosome. The program then discounted those hits where the ends were facing the same direction or in opposite directions away from one another. Finally, it discounted all hits which had ends more than 60 kb and less than 10 kb from one another.  Table 7 represents the output produced using a program to align and match all blast hits for each clone discarded due to ends aligning to two separate chromosomes. 361 of the 436 clones show multiple conventional alignments. Most of the clones with multiple conventional alignments each show hits in multiple chromosomes. 46  Alignments for the 361 clones had an average of 2.5 chromosomes hits per clone. The multiple hits produced the 901 possible chromosome alignments for the set. The alignments determined to have conventional orientations, with ends on opposite strands facing each other, amounted to 10,949,876 placements. Even when these were trimmed down to take size into consideration, the clones aligning to between 10 and 60 kb segments amounted to 123,268 addresses. The 75 clones remaining, from the 436 initially tested, have no alternate single chromosome address and are more likely correct in their initial alignment.  47  Table 7: Permutations calculated for the blast hits produced by paired ends in the unconventional subgroup determined to align to two chromosomes  Permutations for 436 clones’ different chromosomes I II III IV V X Total  145 160 134 155 173 134 901  Permutations of paired ends aligning to same chromosome 1,893,136 1,003,828 1,189,720 955,206 777,269 5,130,717 10,949,876  Permutations of paired ends aligning to same chromosome in size range 2,849 6,879 977 3,621 108,408 534 123,268  No valid permutation  75  48  Many of these alignment possibilities have a similar alignment score for one or both sides of the pair suggesting that these clones are most likely not due to translocations, but a failure in selecting the most likely placement. However, there are 75 clones, which do not align with both pairs on a single chromosome within the selection size possible. These may be interesting candidates to examine more closely for larger chromosomal rearrangements in these areas.  Two such clones are seen in the group of 75, which may show a rearrangement. Both have ends that fall within 83 kb of one another on chromosome IV and 137 kb on chromosome II. The ends on chromosome IV are nestled between two gapped regions of 18.5 kb and 45.5 kb. Chromosome II has two ends falling into gapped regions in the fosmid library. Both ends on chromosome II fall within 5 kb and 20 kb of known indels and may be caused by an unbalanced translocation event resulting in the deletions seen. These associated deletions are encompassed in individual fosmids.  49  4 Discussion  Two libraries have been produced. The first, WRM06, is a fosmid genomic library using the canonical N2 Bristol strain of C. elegans. The second, WRMHS, is a Hawaiian geographic isolate derived fosmid library. The CB4856 strain used has been determined to be the most divergent C. elegans isolate from N2 and represents a genome from the other half of a balancing selection evolved within the worm. The N2 library was created in 2006. The WRMHS library was made in 2010 as part of my Master’s project.  Initial production of the WRM06 fosmid library was done to provide the worm community with an alternative vector source to the cosmids produced in the worm genome-sequencing project, as many of these original clones have lost viability or rearranged over time. Besides their use in complementation assays, fosmids have garnered interest from those studying the whole genome. One application provides the opportunity to study gene function through recombineering (Dolphin and Hope 2006). The large size of the fosmids allows studies of native transcription/ translation with regulatory regions in the promoter and intergenic regions, as well as those possibly found downstream, for all but the few largest genes in C. elegans.  50  The N2 library was designed to obtain 5x coverage of the genome. The actual coverage was somewhat less due to misalignment and unclonable regions with gaps left in the library’s contiguous sequence. By using the Hawaiian strain CB4856 to produce another library, it was hoped that the fosmids would overlap to complement the areas and provide contiguous sequence over the entire genome. As there are significant differences in genome structure, which may relate to altered distribution of repeat elements between the two strains, this idea seemed feasible. With genomic variation providing sequence difference, allowing proper addressing and tiling, and molecular changes providing a different environment, on which bacterial machinery may be able to function, we hoped the gaps could be filled. The misalignment and molecular challenges impacting microbial DNA replication machinery, due to repetitive regions, are thought to cause the gaps in the imbricate sequence. This rationale appears justified as the combination of the two libraries did significantly reduce the number of gaps, seen in Table 2, and did increase the number of genes covered by fosmids, described in Table 3. These two libraries may find increased use in genomic studies and provide tools to determine the genomic structural changes that produced the differences between the Bristol and Hawaiian geographic isolates.  51  4.1 Analysis of Clones Produced in the Two Geographic Isolate Libraries  As stated in the results, clones from both libraries were end sequenced. The end sequences were blast aligned to the WS210 library and the positional data of the bestfit outputs were used to calculate the orientation and length of the inserts. The clones displaying unconventional inserts were separated and the fosmids containing conventional sequence were positioned on chromosomes. The separation of the unconventional inserts was necessary for tiling of the clones, but it may be premature to dismiss them outright as they could show structural differences from the annotated genome displayed as the canonical N2 sequence.  All of the different unconventional inserts may be caused by structural variation within the genome. This could be due to repetitive sequences, chromosomal rearrangements or possibly a combination of the two. Figure 3 illustrates possible structural mechanisms that could cause unconventional inserts. For example, a sequence tandemly repeated with repetitive elements bookending it might contain an insertion that represents the clone’s size smaller than it actually is (Figure 3B). Palindromic sequence and inverted repeats can cause clones to look as though the ends are pointing in the same direction or are in opposite directions away from one another (Figure 3d and 3e). The clones displaying inserts with ends aligning on two different 52  chromosomes are not likely caused by tandem repeats, but may be misalignments caused by repetitive sequence or result from translocations that have occurred within the isolate in question (Figure 3a). Fosmids with insert sizes larger than would be expected are less likely to be caused directly by tandemly repeated sequences within the genome. However, the lack of a tandemly repeated region, when aligned to a genome that contains the repetitive element, may falsely represent the size of a clone. Such a clone would appear larger than is possible given the size limitations that ! phage packaging extract imposes on length (Feiss et al. 1977; Figure 3c).  53  A)  B)  C)  D)  E)  Figure 3: Representation of the five unconventional inserts and the rearrangements that could produce them. Each line represents a chromosome depicting the end sequences on the left followed by the rearrangement on the right. Like coloured fragments represent like sequence. A) Shows a translocation between two chromosomes producing paired ends aligning to different chromosomes. B) Insertion creating a larger sized fosmid sequence than expected. C) A deletion in a tandem repetitive region, which would show up as a smaller insert than is expected. D) Either palindromic sequence or an inverted repeat causing both ends to appear to be on the same chromosome. E) A sequence with a spontaneous tandem duplication showing ends to be on opposite strands facing away from one another. 54  Clones displaying a size under the minimum allowed by the packaging extract are likely caused by misalignment of repetitive sequence, or by ambiguous alignment of repetitive sequence causing difficulty defining the true location. The undersized inserts may also be caused by an inserted element into the chromosome making the end points appear to be far closer than would be allowed by the ! phage packaging extract (Feiss et al. 1977).  The fosmid clones with inserts calculated as too small to be packaged may be examples of ambiguous repetitive alignment. Of the 868 clones with aligned sizes calculating to less than 15 kb all but 56 fell into one of 27 groups of fosmids containing more than 4 members, in which the insert ends were aligned within 10 kb of one another (Appendix B). The largest of these groups consisted of 280 vectors with ends between 15,056,645 to 15,072,421 on chromosome I. This is a region with repetitive sequence, which makes even the current chromosome organization questionable in this area (personal communication Robert Waterston). The N2 library contains a similar group in the same area containing 576 clones and 21 other groups with over 4 members. Each one of these 27 groups in WRMHS and 21 groups in WRM06 are likely to provide evidence of regions of highly repetitive sequence within these genomes.  Discarding the clones that do not conventionally align simplifies the analysis of the library. It should be pointed out, however, that misaligned clones might fill in some of 55  the gapped sequence. Evidence for this possibility can be found in the number of fosmids aligning to regions directly beside a gap. Of the 56 individual fosmids and the 27 groups of clones, 37 fosmids aligned directly beside a gap. Some of the groups had multiple clone alignments beside gaps. The analysis necessary to evaluate the correct positions of these fosmids within the genome would be time consuming and could cost as much as the sequencing in the original library. For these reasons, the nonconforming inserts were not analyzed further. However, making clones available to interested parties will allow the discernment of the alignment of some of these unconventionally aligned inserts and the regions that produced the fragments.  With both libraries produced and end sequenced the overall quality of the fosmid resources could be compared to one another. It is clear that the library produced from the Hawaiian geographic isolate is superior in most metrics displayed. Table 1and table 2 detail the individual libraries’ fosmids and their coverage of the canonical genome. There are only a few instances in which the N2 library produces more favourable results. This was not expected. The slightly smaller size (384 fewer clones produced and sequenced) of the library would alone suggest this is improbable. This incongruity is strengthened by the sequencing being somewhat less successful in the WRMHS library (28,326 ends) in comparison to the WRM06 (31,397 ends) resulting in a 9.1 % smaller aligned Hawaiian library. The sequence divergence of the Hawaiian strain from N2 decreases the likelihood of a better library being produced from the  56  CB4856 isolate. The unlikelihood of a better quality library being produced from the alignment to a more divergent canonical genome is unexpected.  The regions of the genome displaying gaps in contiguous sequence within the libraries also exhibited differences between WRMHS and WRM06. The smaller total gapped area in contiguous fosmids seen in WRMHS was able to account for the smaller percentage of total protein coding regions associated with the gaps (Table 3). The decreased percentage of YAC derived coding regions from the total exhibited in WRMHS (Table 4) cannot be explained as easily. As well the lower concentration of repetitive elements in WRMHS gaps (table 5) are also not as simply explained. The use of a divergent genome may have had some beneficial effects on library production that could account for these inequalities. The differences in difficulty seen in finishing some genomes (Blakesley et al. 2010) may have some equivalence within geographic isolate sequencing. The genomic divergence may contribute to the increased quality in the WRMHS library, though the relatively small disparity between the genomes would make it unlikely.  In production of the libraries, several adjustments were made to the protocol used to purify and size DNA. The procedure used to clone the DNA was largely unchanged from WRM06 to WRMHS. One protocol was used in the WRM06 library, but the WRMHS library used two different protocols to polish and purify. The increased attention to polishing and purification of the Hawaiian DNA may have provided 57  cleaner material, or slightly different chemical environments for the inserted molecules. Using two separate purification protocols each for half of the Hawaiian library may have also contributed to the differences in coverage and coding regions attained, which seems more likely than the divergence causing a decrease in difficulty cloning some areas. It is difficult to provide evidence to support either postulate with the present information. Further work will need to be done to distinguish between these two possibilities. Evidence may be found by producing a new library with each DNA purification protocol performed and packaged independently. These libraries could then be sequenced to a depth similar to, or greater than, the depth used in this study using next generation sequencing technology. A comparison of the output could then provide a clearer view of the role the chemical environment played in the fosmid libraries by controlling the variable of differential genome sequence and allowing side by side comparisons with a single changed variable.  Table 2 shows that coverage of both the WRM06 and the WRMHS libraries has produced contiguous fosmids for just over 90% of the annotated genome. The N2 library contains 409 gaps amounting to 18,992,942 bp of missing sequence or 19% of the genome. The CB4856 library is missing 14,721,675 bp or 15% of the genome in 454 gaps. When tiled together there are still 294 gaps remaining, but only 10% of the genome, or 9,596,484 bp remain unaccounted for. Lack of complete complementation of one library by the other is not surprising given the regions which do not have contiguous sequence. The clustering of gaps at the end of the arms of each 58  chromosome is visible in Figure 1. The libraries, independently and combined, have fewer gaps towards the centres of each chromosome and these gaps tend to be smaller. Likewise, very few and very small gaps are seen on the X chromosome. A similar pattern was observed when sequencing the worm’s genome, when groups were working through clusters of repetitive elements (Consortium 1998).  The co-localization of the gaps found in the fosmid and original cosmid libraries’ contiguous sequence with repetitive regions has previously been demonstrated in multiple vertebrate species (Blakesley et al. 2004). The pattern of gaps is most likely due to misalignment of arms of the inserts to repetitive elements elsewhere in the genome or to incorrect size estimation of properly addressed ends. Areas of repetitive sequence are frequently duplicated and translocated within a genome, causing larger chromosomal anomalies that can be difficult to align (Bishop and Schiestl 2000; Volik et al. 2003). These two reasons provide the first point of difficulty when attempting to find contiguous sequence in these areas. Sequence repetitions also allow secondary and tertiary structures to form, such as Z DNA, palindromes and kinkable dinucleotide steps (Razin et al. 2001), which cause difficulty in propagating and packaging the eukaryotic sequence with bacterial machinery and hosts.  By exploring repetitive sequence distribution on a chromosome, I was able to detect the co-occurrence of areas that were originally only clonable using YAC vectors and increased levels of repetitive elements. By comparing protein coding gene positions 59  with regions 40 kb upstream and downstream from each end to provide non-coding sequence, I was able to see a co-incidence of Yeast sourced clones and repetitive elements in the same areas. The X chromosome had 5% of genes that were initially derived from YAC clones and 5% of sequence around the genes came from the same source. However, 7% of repetitive elements, falling within 40 kb of a gene, were around a protein-coding sequence initially cloned in a YAC. This implies a 48% enrichment in the number of repetitive elements within 40 kb of the ends of YAC derived genes compared to the rest of the genome. This enrichment increases when moving from the originally YAC derived sequence to the individual gaps within libraries, and finally to the combined gaps when fosmids are tiled together (Table 6). The increasing quantity of repetitive elements in the non-covered regions suggests the clustered repeats may be preferentially excluded from the libraries. This is not surprising as repetitive regions are expected to be difficult to clone or align, and are likely the cause of the difficulty in filling these gaps.  4.2 Protein Coding Gene Coverage and Gaps  Gaps in contiguous fosmid sequences were analyzed to determine how many proteincoding genes are excluded from each library. Independently the two libraries cover about 85% of the genes (Table 3), but when combined the coverage jumps to about 93%. This increase will provide better access to DNA that was previously 60  unavailable, within bacterial vectors, and more complete access to genes in these regions. The increase in protein coding region coverage, when the libraries’ contigs are combined, indicates that at least a portion of the sequence believed to be unclonable (Waterston and Sulston 1995) was simply mismapped or difficult to propagate in bacteria. However, the incomplete overlap suggests that there are regions of the chromosome, which may be unclonable or are, more likely, very difficult to clone.  The numbers of genes in the libraries will most likely increase due to re-annotation, as has been seen from the WS140 alignment of the WRM06 library, as well as new insights in the literature describing regions and repetitive sequence. One such example of this was described by Vergara et. al. (2009). Their analysis of the genomes of many different laboratory strains, as well as CB4856, noted several duplications arising within the C. elegans community’s N2 strains. They were able to trace these events to the original labs that disseminated the N2 stocks and concluded that there were several tandem duplications within the genome of the laboratory strain used for the sequencing project. These duplications are only present in a subset originating from a single lab, providing further support for the idea that genetic drift is prevalent among laboratory N2 strains (see also Denver et al. 2009; Flibotte et al. 2010; Hillier et al. 2008). Interestingly, they also showed these duplications are missing from the Hawaiian strain (Vergara et al. 2009). Two tandem duplications differentiating the  61  sequenced N2 from our VC196 and CB4856 can be detected in the two fosmid libraries.  The region this duplication covers is on chromosome V from 2,347,883 to 2,562,875. The combined fosmid libraries have a gap on chromosome V for this region. Further analysis revealed seven individual clones from the Hawaiian strain that cover this interval. These clones were originally dismissed as too large as they have calculated sizes of 141 kb to 149 kb. The tandem repeat was discovered to be 106,707 bp in length, which would place these clones at a normally expected length, if the repeat is not present in this strain, as was shown by Vergara et al.(2009). Still further investigation showed 6 clones from this region in the N2 library, with similar lengths, suggesting that this 106 kb tandem repeat is not contained in the VC196 wild type genome.  A smaller duplication was also detected on chromosome V between 8,813,143 and 8,892,906. There are14 clones covering this region for the WRM06 library and all were excluded from alignment in contigs as they had calculated sizes between 68 kb and 74 kb. Accounting for the duplication size of 37,642 bp, these would again be deemed valid clones if the repeat were missing. The WRMHS library has 6 clones covering this region with inserts calculated from 72 to 85 kb. With inserts displaying paired ends aligning to this region, with sizes that are outside of that which is able to be packaged by ! phage (Feiss et al. 1977), evidence is provided that these tandem 62  repeats are not present in the libraries and therefore the genomes. These repeats were not seen in CB4856 in the study (Vergara et al. 2009). These two different tandem repeats are associated with 36 protein-coding genes. Due to the lack of these regions being repeated in these strains, the total genes expected in the libraries will be decreased by approximately the same number. These two instances are unlikely to be unique.  As the genome is re-annotated, coverage in the libraries will change and may encompass more coding sequence. Other tandem repeats described in the same paper (Vergara et al. 2009) were not readily visible within the libraries as gaps, most likely due to their small size. Further study of the Hawaiian clones determined to be outside of a suitable size for packaging by the ! phage particles showed other possible candidates, including either tandem repeat areas within the N2 library not held within the Hawaiian strain , or simple deletions of sequence. Chromosome IV contains an anomaly spanning 14 genes and deleting 41.9 kb of sequence (Maydan et al. 2010). Supportive evidence for this being a simple deletion was provided by three clones that were initially discarded as too large, measuring between 77 and 80 kb. The calculated size did not take into consideration the deletion interval originally. By factoring in the missing sequence, the clones are of the expected compatible size.  63  4.3 Correlation of Gaps with YAC-derived Coding Sequences  Even after combining the two libraries, there are several extant gaps. This may be due to a condition described to me in a personal communication with Robert Waterston. While sequencing the worm’s genome, portions of the chromosomes were unable to be packaged or propagated within a bacterial cell and could only be cloned within Yeast Artificial Chromosomes (YAC). The incomplete overlap suggests that there are regions of the chromosome that are unclonable (Waterston and Sulston 1995) or, more likely, very difficult to clone within bacterial vectors.  During sequencing of the first N2 genome (Consortium 1998), even with a six fold redundant coverage of the genome in cosmids, non-random gaps in contiguous sequence persisted. Unsuccessful efforts were made to fill these gaps using cosmid and fosmid clones. In these cases YAC vectors were used to complete the genomic coverage. Our approach of using fosmid libraries prepared from two different geographic isolates reduced the number of gaps but still left a substantial number of holes in coverage along all the chromosomes. The apparent inability of fosmids or cosmids to cover all regions of the C. elegans genome remains. By exploring YACderived sequence co-occurrence within the libraries, it should be possible to see if the gaps in the original cosmid libraries are the same as those seen in our fosmid libraries. 64  By comparing the number of genes initially derived only in yeast sourced clones to all other coding regions found in library gaps it should be possible to determine if these sequences are still evading coverage in bacterial vectors. There is a marked enrichment of the coding sequences only cloned originally in YAC-derived vectors within the fosmid gaps. 81% of coding sequence genes, originally only cloned in YACs, are covered in the combined libraries (table 4). The enrichment shows the phenomenon described previously (Consortium 1998; Waterston and Sulston 1995) has persisted, using a complete fosmid clone set. When the libraries’ coverage is combined, the enrichment is greater. This lends credence to the suggestion that some of these sequences, which were originally only YAC derived, may be unclonable in bacterial hosts (Waterston and Sulston 1995). What is more likely is that they were hard to clone, or tricky to propagate within the strain originally used, or were difficult to clone using the production and packaging system adopted in the sequencing project.  While the gapped sequences within the combined libraries were proportionally higher in YAC derived sequence, we were able to clone 2,638 coding regions initially believed to be unclonable in bacteria. Even alone, with a 4.2X and 3.9X theoretical coverage, the N2 and CB4856 libraries were able to capture 1,776 and 2,218 genes respectively, which were previously only represented in YAC vectors. This result shows that some genes not previously captured within cosmids are clonable, challenging previous assertions (Waterston and Sulston 1995). This provides evidence 65  that suggest the protocol, genome or bacterial strain and vector, used for producing a library affect the resulting coverage in that library. This also reinforces that there was likely a difficulty with cloning certain sequences or propagating them in the strain originally used in the sequencing libraries, which may also have been caused by the production or packaging systems adopted at the time.  A study performed on multiple vertebrate genomes, using BAC clones, with both copy control and multicopy vectors to limit the copy number of individual sequences within the host strain showed sequences that were difficult to package are made less problematic when maintained with fewer copies in the host (Blakesley et al. 2010). This is likely due to the interaction of library sequences within the host, causing secondary and tertiary structures that will stall or freeze replication machinery, and/or cause repair mechanisms to alter the sequence then propagated, or halt replication of the cell altogether. If the phenomenon were entirely due to recombination, the inserts would likely contain internal rearrangements that would remain imperceptible after end sequencing if carried in a multicopy vector. Empty vectors may also be a symptom of such a rearrangement and these were not observed. The complete lack of inserts for some areas points to an inability in packaging or propagation of those regions. The use of a single copy vector may be the reason the WRMHS and WRM06 fosmid libraries were able to capture these previously unclonable sequences. Even though fosmid libraries were used in the original sequencing, the relative proportion, 113 fosmids to 2,527 cosmids (Consortium 1998), would have made them unlikely to 66  have an effect on the overall genome coverage. A hint at the possibility of difficult sequences cloning better in fosmids was made when it was described that fosmids allowed a third of the gaps found in the central region of the chromosomes, but not at the ends, to be bridged (Consortium 1998). The results described in this study support this conclusion.  The possibility remains that the lack of complete coverage for either of the fosmid or the cosmid libraries is a result of a depth of sequencing issue. This may account for the differential presentation of gaps in contiguous sequence seen in the cosmids as well as the N2 and CB4856 fosmids. One piece of data provides evidence to the contrary. Gap positions on the genome remain largely unchanged from one library to another. Even distribution of clones over the genome would unlikely result in three separate libraries with regions of overlap and gaps at similar positions with little variation between them. It might be possible to cover some gapped regions more completely by producing a larger library and sequencing it to a greater depth. However, the lack of genomic coverage with a bacterial vector and the almost complete YAC based coverage, with far less clones proportionally, suggest the problem lies in the packaging or propagation in the specific host.  The increased coverage seen from the N2 to the CB4856 fosmids may have been due to the larger quantity of aligned sequences offsetting the decreased likelihood that the more difficult regions are cloned. A similar increase may have been visible if another 67  N2 library were produced and sequenced to the same depth as the Hawaiian library. To explore this a chi squared test was performed on the proportion of genes found in the WRM06 and WRMHS libraries. The test showed we are able to reject the null hypothesis and support the alternate, that the proportion of YAC and cosmid derived genes were dependant on the library producing them. This does not separate the genome used in the library from the method used to produce it and may easily be misconstrued as one or the other. However, decoupling the use of different genomes and different purification methods is not possible with the current study and may need support to provide conclusive evidence. By using next generation sequencers, as was described earlier, it would be possible to sequence a larger library to a much greater depth than was possible in this study. The greater depth could allow the investigator to probe the differences seen with another geographic isolate, by making and sequencing libraries produced independently with the different purification techniques and strains. This would thereby separate the variables to determine their individual contribution to the coverage.  4.4 Alignment Difficulties  Looking at the permutations created in some end sequence pairs it becomes apparent that excluding clones that align in unconventional ways may be premature. The multiply aligned end sequences are the most difficult hurdle to overcome in any 68  sequencing or genomic study. Repetitive sequences leading to miscalculation or improper addressing of ends are a difficult problem, especially with the smaller insert sizes common for next generation sequencing technology. The paired end sequencing performed here showed substantial issues with 500 bp ends and 30 kb to 45 kb overall length. Pairing next generation sized sequences may not minimize the issue.  It is quite clear that given the complexity of some genomes, the BLAST algorithm may not be ideal as a basis for alignment of sequence with some repetitive elements. The number of clones that initially showed ends aligning to separate chromosomes supports this. When the permutations of end alignments in this group were explored, the possibility of misalignments became more apparent. The 123,268 permutations falling between 10 kb and 60 kb on the same chromosome suggest the unlikelihood of the original alignment to separate chromosomes being correct. The end placements may, however, have been correct initially and a more in depth study of the structure of these inserts should define their actual position.  As previously described, the clones not showing multiple alignment positions are more likely correct than those with multiple calculated positions. The list of clones in this category was checked for multiple fosmids aligning to the same end positions. Two candidate clones were found. With end positions relating to chromosomes IV and II, and within close proximity to a gapped region on IV, these may indicate a translocation event in the WRMHS library. Further, analysis revealed that the region 69  falling between the two clones unpaired end positions aligning on chromosome IV contains a deletion, discovered by CGH (Maydan et al. 2010). The deletion position on chromosome IV provides evidence that the translocation is unbalanced and affects at least three genes. It seems likely this would be an insertional translocation as there is no genetic evidence from crosses between these two strains of a translocation of chromosome arms.  Repetitive sequences are problematic for alignment as well as de novo sequencing and lead to challenges in cloning. These issues are possibly reflected in the concentration of repetitive elements in the gaps found in the libraries. Using alternative low copy vectors has alleviated the problem, to some extent, but does not eliminate it. BLAST software is not entirely able to differentiate between like sequences and new alignment algorithms might be necessary to empower paired end sequencing with inter-sequence spacing. The intervals used also have to be larger than the largest individual sequence repeated, which can be quite large as seen in the tandem duplications apparent in the canonical sequence. This being said, the libraries produced are likely not representative of the total clonable and alignable sequence. There are still proteincoding genes, originally derived in cosmid sequences, which have not been captured within fosmids. These coding regions are represented by the 19% of genes, which were not YAC derived originally, that fall into gaps in combined coverage. The missing open reading frames suggest that at least the sequence held around these 308 protein coding genes should be clonable. The sequence may be found within the 70  groups of clones containing unconventional inserts, but may have been missed in these two libraries.  An example of a missed coding sequence, within the cosmid and WRM06 libraries, was seen with the unc-119 gene. One reason these sequences may have been missed in this study is that they are unclonable within the vectors and hosts employed in this investigation. Further research on these gapped sequences may provide the answer. The quality attained in the WRMHS library suggests that any further attempts to fill in the gaps, seen in contiguous fosmid coverage, should be encouraged to vary the DNA purification and polishing protocols. This combined with using a copy control vector would likely produce the best possibility for covering the gapped regions.  In conclusion, I have created two fosmid libraries. The WRM06 library made in 2005 from the N2 variant of C. elegans VC196 and the WRMHS fosmids made as part of my thesis from the CB4856 geographic isolate. They were mapped to the WS210 genome and together cover 92.8% of genes. Both libraries show coverage of previously unclonable regions of C. elegans DNA in bacterial vectors. The remaining gaps in contiguous sequence for the original cosmid library and each of the fosmid libraries, independently and combined, show an increasing concentration of repetitive elements in the gaps. These repeat sequences may be the greatest cause for difficulty in cloning and propagation. These trends have been seen in other organisms.  71  Of the two libraries I produced, WRMHS was superior based on clone mean size and standard deviation as well as genomic coverage and gap size and standard deviation. The WRMHS library was able to cover 115 gaps in contiguous coverage within the WRM06 library and with them 1893 genes not contained in fosmids. I was also able to cover 2600 genes not previously captured in bacterial vectors. WRMHS differed from WRM06 in the strain used to produce DNA as well as the post purification polishing regimes performed to limit the effect that using only one type of purification might have on the resulting library. Due to the change of multiple variables it is not possible to distinguish which specific variable caused the significant increase in capture of previously uncloned DNA. Using next generation sequencing technology could possibly differentiate the relative importance of the different variables. The library has also have captured regions containing several larger chromosomal differences between N2 and CB4856 including deletions and duplications, examples of which can be seen in several fosmids. Further exploration of these regions may be performed by sequencing the clones aligning to the regions or sequencing groups of clones that are aligning unconventionally. This technique may be empowered by using the program I designed, described in section 3.7, to increase the likelihood that the unconventional alignments are not due entirely to mismapped repeat regions. This would have the added benefit of decreased sequencing cost and/or man hours for subcloning. The sequencing may also uncover some other clones containing genes of interest or regions in gaps.  72  References  Bashir, A., S. Volik, C. Collins, V. Bafna and B. J. Raphael, 2008 Evaluation of PairedEnd Sequencing Strategies for Detection of genome Rearrangements in Cancer. PLoS Comput Biol 4: e1000051. Bauchwitz, R., and F. Costantini, 1998 YAC Transgenesis: A Study of Conditions to Protect YAC DNA from Breakage and a Protocol for Transfection. Biochimica Et Biophysica Acta-Molecular Cell Research 1401: 21-37. Bishop, A. J. R., and R. H. Schiestl, 2000 Homologous Recombination as a Mechanism for Genome Rearrangements: Environmental and Genetic Effects. Human Molecular Genetics 9: 2427-2434. Blakesley, R. W., N. F. Hansen, J. Gupta, J. C. McDowell, B. Maskeri et al., 2010 Effort Required to Finish Shotgun-Generated Genome Sequences Differs Significantly Among Vertebrates. BMC Genomics 11: -. Blakesley, R. W., N. F. Hansen, J. C. Mullikin, P. J. Thomas, J. C. McDowell et al., 2004 An Intermediate Grade of Finished Genomic Sequence Suitable for Comparative Analyses. Genome Res 14: 2235-2244. Brenner, S., 1974 Genetics of Caenorhabditis-Elegans. Genetics 77: 71-94. Caenorhabditis Genetics Center web site, https://dbw6.msi.umn.edu/cgcdb/strain.php?id=7525, Aug 30 2007 73  Celniker, S. E., L. A. L. Dillon, M. B. Gerstein, K. C. Gunsalus, S. Henikoff et al., 2009 Unlocking the Secrets of the Genome. Nature 459: 927-930. Chalfie, M., Y. Tu, G. Euskirchen, W. W. Ward and D. C. Prasher, 1994 Green Fluorescent Protein as a Marker for Gene-Expression. Science 263: 802-805. Consortium, C. e. G. S., 1998 Genome sequence of the nematode C. elegans: a Platform for Investigating Biology. Science 282: 2012-2018. Coulson, A., J. Sulston, S. Brenner and J. Karn, 1986 Toward a Physical Map of the Genome of the Nematode Caenorhabditis-Elegans. Proceedings of the National Academy of Sciences of the United States of America 83: 7821-7825. Denver, D. R., P. C. Dolan, L. J. Wilhelm, W. Sung, J. I. Lucas-Lledo et al., 2009 A Genome-Wide View of Caenorhabditis elegans Base-Substitution Mutation Processes. Proceedings of the National Academy of Sciences of the United States of America 106: 16310-16314. Denver, D. R., K. Morris and W. K. Thomas, 2003 Phylogenetics in Caenorhabditis elegans: An Analysis of Divergence and Outcrossing. Molecular Biology and Evolution 20: 393-400. Dolphin, C. T., and I. A. Hope, 2006 Caenorhabditis elegans Reporter Fusion Genes Generated by Seamless Modification of Large Genomic DNA Clones. Nucleic Acids Research 34: -. Ellis, H. M., and H. R. Horvitz, 1986 Genetic-Control of Programmed Cell-Death in the Nematode C-Elegans. Cell 44: 817-829.  74  Feiss, M., R. A. Fisher, M. A. Crayton and C. Egner, 1977 Packaging of BacteriophageLambda Chromosome - Effect of Chromosome Length. Virology 77: 281-293. Fire, A., and R. H. Waterston, 1989 Proper Expression of Myosin Genes in Transgenic Nematodes. Embo Journal 8: 3419-3428. Fire, A., S. Q. Xu, M. K. Montgomery, S. A. Kostas, S. E. Driver et al., 1998 Potent and Specific Genetic Interference by Double-Stranded RNA in Caenorhabditis elegans. Nature 391: 806-811. Flibotte, S., M. L. Edgley, I. Chaudhry, J. Taylor, S. E. Neil et al., 2010 Whole-Genome Profiling of Mutagenesis in Caenorhabditis elegans. Genetics 185: 431-441. Flibotte, S., M. L. Edgley, J. Maydan, J. Taylor, R. Zapf et al., 2009 Rapid High Resolution Single Nucleotide Polymorphism-Comparative Genome Hybridization Mapping in Caenorhabditis elegans. Genetics 181: 33-37. Flicek, P., and E. Birney, 2009 Sense from Sequence Reads: Methods for Alignment and Assembly. Nature Methods 6: S6-S12. Fraser, A. G., R. S. Kamath, P. Zipperlen, M. Martinez-Campos, M. Sohrmann et al., 2000 Functional Genomic Analysis of C. elegans Chromosome I by Systematic RNA Interference. Nature 408: 325-330. Giraldo, P., and L. Montoliu, 2001 Size matters: Use of YACs, BACs and PACs in Transgenic Animals. Transgenic Research 10: 83-103. Hillier, L. W., G. T. Marth, A. R. Quinlan, D. Dooling, G. Fewell et al., 2008 WholeGenome Sequencing and Variant Discovery in C. elegans. Nature Methods 5: 183-188. 75  Huang, R. Y., S. J. Boulton, M. Vidal, S. C. Almo, A. R. Bresnick et al., 2003 HighThroughput Expression, Purification, and Characterization of Recombinant Caenorhabditis elegans Proteins. Biochem Biophys Res Commun 307: 928-934. Kim, U. J., H. Shizuya, P. J. Dejong, B. Birren and M. I. Simon, 1992 Stable Propagation of Cosmid Sized Human DNA Insertis in an F-Factor Based Vector. Nucleic Acids Research 20: 1083-1085. Kinscherf, T. G., M. N. Yap, A. O. Charkowski and D. K. Willis, 2009 Chef Procedures: A Rapid High-Temperature Method for Sample Preparation, a High Voltage Hepes Buffer System and the Use of Nusieve (R) Agarose. Journal of Rapid Methods and Automation in Microbiology 17: 9-16. Li, S., C. M. Armstrong, N. Bertin, H. Ge, S. Milstein et al., 2004 A Map of the Interactome Network of the Metazoan C. elegans. Science 303: 540-543. Luan, C. H., S. Qiu, J. B. Finley, M. Carson, R. J. Gray et al., 2004 High-Throughput Expression of C. elegans Proteins. Genome Res 14: 2102-2110. Maydan, J. S., S. Flibotte, M. L. Edgley, J. Lau, R. R. Selzer et al., 2007 Efficient HighResolution Deletion Discovery in Caenorhabditis elegans by Array Comparative Genomic Hybridization. Genome Research 17: 337-347. Maydan, J. S., A. Lorch, M. L. Edgley, S. Flibotte and D. G. Moerman, 2010 Copy Number Variation in the Genomes of Twelve Natural Isolates of Caenorhabditis elegans. BMC Genomics 11: 62.  76  Mcnamara, P. T., A. Bolshoy, E. N. Trifonov and R. E. Harrington, 1990 SequenceDependent Kinks Induced in Curved DNA. Journal of Biomolecular Structure & Dynamics 8: 529-538. Mello, C. C., J. M. Kramer, D. Stinchcomb and V. Ambros, 1991 Efficient Gene-Transfer in C-Elegans - Extrachromosomal Maintenance and Integration of Transforming Sequences. Embo Journal 10: 3959-3970. Metzker, M. L., 2010 Applications of Next-Generation Sequencing Sequencing Technologies - the Next Generation. Nature Reviews Genetics 11: 31-46. Moerman, D. G., and R. J. Barstead, 2008 Towards a Mutation in Every Gene in Caenorhabditis elegans. Brief Funct Genomic Proteomic 7: 195-204. Murray, V., C. Monchawin and P. R. England, 1993 The Determination of the Sequences Present in the Shadow Bands of a Dinucleotide Repeat Pcr. Nucleic Acids Research 21: 2395-2398. Neil, D. L., A. Villasante, R. B. Fisher, D. Vetrie, B. Cox et al., 1990 Structural Instability of Human Tandemly Repeated DNA-Sequences Cloned in Yease Artificial Chromosome Vectors. Nucleic Acids Research 18: 1421-1428. Odelberg, S. J., R. B. Weiss, A. Hata and R. White, 1995 Template-Switching during DNA-Synthesis by Thermus-Aquaticus DNA-Polymerase-I. Nucleic Acids Research 23: 2049-2057. Pel, J., D. Broemeling, L. Mai, H. L. Poon, G. Tropini et al., 2009 Nonlinear Electrophoretic Response Yields a Unique Parameter for Separation of  77  Biomolecules. Proceedings of the National Academy of Sciences of the United States of America 106: 14796-14801. Razin, S. V., E. S. Ioudinkova, E. N. Trifonov and K. Scherrer, 2001 Non-Clonability Correlates With Genomic Instability: A Case Study of a Unique DNA Region. Journal of Molecular Biology 307: 481-486. Seidel, H. S., M. V. Rockman and L. Kruglyak, 2008 Widespread Genetic Incompatibility in C. elegans Maintained by Balancing Selection. Science 319: 589-594. Song, J. Q., F. G. Dong, J. W. Lilly, R. M. Stupar and J. M. Jiang, 2001 Instability of Bacterial Artificial Chromosome (BAC) Clones Containing Tandemly Repeated DNA Sequences. Genome 44: 463-469. Sulston, J. E., and H. R. Horvitz, 1977 Post-Embryonic Cell Lineages of Nematode, Caenorhabditis-Elegans. Developmental Biology 56: 110-156. Sulston, J. E., E. Schierenberg, J. G. White and J. N. Thomson, 1983 The Embryonic-Cell Lineage of the Nematode Caenorhabditis-Elegans. Developmental Biology 100: 64-119. Swan, K. A., D. E. Curtis, K. B. McKusick, A. V. Voinov, F. A. Mapa et al., 2002 HighThroughput Gene Mapping in Caenorhabditis elegans. Genome Res 12: 11001105. Timmons, L., and A. Fire, 1998 Specific Interference by Ingested dsRNA. Nature 395: 854-854.  78  Tursun, B., L. Cochella, I. Carrera and O. Hobert, 2009 A Toolkit and Robust Pipeline for the Generation of Fosmid-Based Reporter Genes in C. elegans. Plos One 4: -. Tuzun, E., A. J. Sharp, J. A. Bailey, R. Kaul, V. A. Morrison et al., 2005 Fine-Scale Structural Variation of the Human Genome. Nature Genetics 37: 727-732. Vaglio, P., P. Lamesch, J. Reboul, J. F. Rual, M. Martinez et al., 2003 WorfDB: the Caenorhabditis elegans ORFeome Database. Nucleic Acids Research 31: 237240. Vergara, I. A., A. K. Mah, J. C. Huang, M. Tarailo-Graovac, R. C. Johnsen et al., 2009 Polymorphic Segmental Duplication in the Nematode Caenorhabditis elegans. BMC Genomics 10: -. Volik, S., S. Zhao, K. Chin, J. H. Brebner, D. R. Herndon et al., 2003 End-Sequence Profiling: Sequence-Based Analysis of Aberrant Genomes. Proc Natl Acad Sci U S A 100: 7696-7701. Waterston, R., and J. Sulston, 1995 The Genome of Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America 92: 1083610840. Wicks, S. R., R. T. Yeh, W. R. Gish, R. H. Waterston and R. H. A. Plasterk, 2001 Rapid Gene Mapping in Caenorhabditis elegans Using a High Density Polymorphism Map. Nature Genetics 28: 160-164. Wormbase website, http://www.wormbase.org, release WS140, Mar 26 2005 Wormbase website, http://www.wormbase.org, release WS210, Dec 22 2009  79  Yokobata, K., B. Trenchak and P. J. Dejong, 1991 Rescue of Unstable Cosmids by Invitro Packaging. Nucleic Acids Research 19: 403-404. Zhang, Y., L. Nash and A. L. Fisher, 2008 A simplified, Robust, and Streamlined Procedure for the Production of C. elegans Transgenes Via Recombineering. Bmc Developmental Biology 8: -.  80  Appendices Appendix A: Gap Positions for Each Library  Figure A1: Graphic depiction of the gap distribution across the chromosomes for the WRM06 library. Top left in red shows chromosome I. Beside it in coral chromosome II. Middle left, in yellow, shows chromosome II. Besisde it in green is chromosome IV. The bottom left has chromosome V and beside it in purple is chromosome X. The X axis shows the position on the chromosome in which the gap lies. The Y axis shows the size of each gap.  81  Figure A2: Graphic depiction of the gap distribution across the chromosomes for the WRMHS library. Top left in red shows chromosome I. Beside it in coral chromosome II. Middle left, in yellow, shows chromosome II. Besisde it in green is chromosome IV. The bottom left has chromosome V and beside it in purple is chromosome X. The X axis shows the position on the chromosome in which the gap lies. The Y axis shows the size of each gap.  82  Appendix B: Examples of Unconventional Inserts Table B1: Fosmids with alignments too large or small to be pain reckaged by ! phage Fosmid Name WRMHS30K04 WRMHS28E04 WRMHS26B06 WRMHS07A13 WRMHS28A07 WRMHS26J01 WRMHS29O16 WRMHS14D06 WRMHS02B15 WRMHS01K13 WRMHS28C22 WRMHS29B07 WRMHS03G17 WRMHS10C14 WRMHS12K15 WRMHS26G11 WRMHS07F17 WRMHS28O16 WRMHS29D17 WRMHS28A08 WRMHS26F03 WRMHS29H23 WRMHS29J10 WRMHS10I15 WRMHS29E14 WRMHS26E04 WRMHS35J10 WRMHS09E20 WRMHS12C19 WRMHS32C01 WRMHS27O09 WRMHS07F05  Sequence Start 933524 933601 933661 933931 934000 934206 934281 934770 3990050 4335006 4542593 5152953 5160628 10132364 10196036 10204503 10204671 10206735 10210676 10211621 10211952 10217055 10218261 10226103 10233153 10265632 10267314 10267964 10268266 10268595 10268604 10270526  Bit Score 1242 1053 473 444 1038 941 723 660 767 1254 1064 1247 1031 604 1269 793 992 1081 1201 1260 1454 872 909 1327 1016 1009 880 811 555 1339 1330 1116  Sequence End 934554 934457 934858 933659 934199 934221 933798 924816 3998412 4341933 4543272 5158235 5152940 10118924 10209331 10212416 10215550 10218431 10208681 10206492 10212860 10208971 10220628 10217348 10218954 10273582 10254869 10278891 10281167 10281122 10277904 10277850  Bit Chromosome Score 1166 I 1002 I 593 I 472 I 1027 I 725 I 1105 I 872 I 1258 I 1421 I 1197 I 1249 I 1109 I 1386 I 374 I 933 I 1375 I 1284 I 1230 I 1234 I 1068 I 872 I 1013 I 1303 I 1411 I 922 I 1029 I 1264 I 1363 I 1443 I 1537 I 1424 I  Gap* True True True True True True True Gap Gap True True True Gap Gap True True True True True True True True True True True True True True True True True True  Size 1030 856 1197 275 1112 976 664 9954 8362 6927 821 5282 7688 13440 13295 7913 10879 11696 1995 5129 908 8084 2367 8755 14199 7950 12445 10927 12901 12527 9300 7324  83  Fosmid Name WRMHS18H23 WRMHS06M06 WRMHS26D06 WRMHS06O02 WRMHS29O24 WRMHS29A21 WRMHS20L02 WRMHS08P20 WRMHS25G05 WRMHS37J05 WRMHS29B05 WRMHS30A22 WRMHS30I17 WRMHS40A11 WRMHS16G05 WRMHS33F07 WRMHS30I13 WRMHS28D08 WRMHS07P16 WRMHS37N19 WRMHS19I22 WRMHS13N05 WRMHS05J11 WRMHS19I23 WRMHS03D10 WRMHS34N08 WRMHS28F20 WRMHS32H02 WRMHS05M17 WRMHS26A17 WRMHS10C18 WRMHS26M24 WRMHS09N10 WRMHS36D14 WRMHS23M15 WRMHS07L15 WRMHS26O11 WRMHS30O20 WRMHS19L08 WRMHS21I12 WRMHS27G09 WRMHS29J14 WRMHS28E01 WRMHS09O06  Sequence Start 10270614 10274741 10275926 10275979 10276007 10281433 10289888 10946239 13244074 15056645 15059595 15060299 15060309 15060333 15060346 15060347 15060350 15060359 15060393 15060415 15060469 15060495 15060538 15060555 15060560 15060591 15060603 15060616 15060637 15060700 15060703 15060707 15060719 15060721 15060745 15060755 15060755 15060771 15060787 15060802 15060817 15061006 15061020 15061031  Bit Score 1027 1345 1441 1325 1120 1397 797 503 1068 1260 1256 1330 1565 1520 1471 1495 1192 1345 1236 1354 1238 1262 1264 1411 1016 1375 1386 1463 1330 1260 1421 1249 1338 1352 1234 1236 571 1345 1328 1085 1358 1182 1410 1173  Sequence End 10273626 10284918 10270700 10268200 10268308 10273848 10275187 10949249 13257861 15067799 15065105 15065707 15066762 15062234 15065780 15068044 15067650 15065455 15065202 15063273 15062893 15064133 15061229 15065117 15066500 15063479 15067364 15062655 15063474 15065002 15062484 15061700 15066201 15066913 15064067 15061134 15063572 15062145 15065696 15067973 15064153 15066062 15063774 15066345  Bit Chromosome Score 1219 I 1349 I 857 I 1090 I 966 I 1155 I 767 I 440 I 1456 I 1308 I 1533 I 1437 I 1456 I 1411 I 1375 I 1533 I 1526 I 1393 I 1428 I 1321 I 1402 I 1020 I 1373 I 1541 I 1332 I 1402 I 1439 I 1452 I 1264 I 1301 I 1500 I 1375 I 1282 I 1210 I 1439 I 1341 I 1314 I 1502 I 1572 I 1280 I 1321 I 1338 I 1397 I 1382 I  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 3012 10177 5226 7779 7699 7585 14701 3010 13787 11154 5510 5408 6453 1901 5434 7697 7300 5096 4809 2858 2424 3638 752 4562 5940 2888 6761 2039 2837 4302 1781 993 5482 6192 3322 1032 2817 1374 4909 7171 3336 5056 2754 5314  84  Fosmid Name WRMHS28D14 WRMHS31N17 WRMHS33B20 WRMHS20C04 WRMHS05B03 WRMHS06L12 WRMHS31G16 WRMHS29H17 WRMHS26B11 WRMHS30O18 WRMHS29C22 WRMHS01P11 WRMHS34O06 WRMHS29E04 WRMHS06L23 WRMHS29G01 WRMHS11B09 WRMHS36D19 WRMHS16E22 WRMHS29H22 WRMHS09P22 WRMHS29B11 WRMHS26B14 WRMHS13F08 WRMHS31H02 WRMHS17B02 WRMHS30J22 WRMHS26L07 WRMHS27A02 WRMHS08H11 WRMHS09F14 WRMHS32M09 WRMHS30O06 WRMHS11E15 WRMHS29M07 WRMHS01I18 WRMHS09E18 WRMHS29H01 WRMHS29C03 WRMHS25J05 WRMHS37H03 WRMHS04M06 WRMHS30H12 WRMHS06O06  Sequence Start 15061050 15061094 15061109 15061109 15061124 15061133 15061158 15061175 15061180 15061225 15061267 15061278 15061284 15061355 15061362 15061380 15061380 15061381 15061434 15061539 15061634 15061663 15061681 15061752 15061766 15061774 15061844 15061990 15061994 15061998 15062005 15062011 15062014 15062033 15062054 15062072 15062110 15062138 15062156 15062176 15062186 15062260 15062268 15062288  Bit Score 1136 1539 1408 1053 1264 1269 1463 1360 1262 1387 1312 1262 959 1293 1275 723 377 1448 1352 1369 1247 1146 1443 1227 1522 1363 1445 1535 1472 1249 1123 1591 1607 1133 1397 1098 1194 1175 1448 1358 1236 1293 1341 1325  Sequence End 15062757 15067942 15062374 15068111 15061427 15069034 15047276 15066262 15066238 15063509 15062498 15062502 15065839 15064415 15062412 15062444 15062778 15067158 15064763 15060582 15064768 15066665 15060855 15060466 15062358 15067180 15063891 15064764 15063734 15067938 15067857 15065363 15062669 15067245 15064641 15061098 15065420 15062977 15066994 15064281 15062201 15060961 15066152 15062752  Bit Chromosome Score 1531 I 1458 I 1458 I 1315 I 1323 I 1251 I 1327 I 1358 I 1262 I 1406 I 1432 I 1347 I 1474 I 983 I 1349 I 1456 I 1411 I 1419 I 1471 I 1295 I 1288 I 1384 I 1391 I 1210 I 1467 I 1435 I 1391 I 1134 I 1445 I 1363 I 1295 I 1450 I 1493 I 1546 I 1574 I 1365 I 1267 I 1472 I 1482 I 1393 I 1304 I 1201 I 1454 I 1086 I  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 1707 6848 1265 7002 1111 7901 13882 5087 5058 2284 1231 1224 4555 3060 1050 1064 1398 5777 3329 957 3134 5002 826 1286 1027 5406 2047 2774 1740 5940 5852 3352 1029 5212 2587 974 3310 839 4838 2105 1369 1299 3884 845  85  Fosmid Name WRMHS32A07 WRMHS22P12 WRMHS36I20 WRMHS09F11 WRMHS40B06 WRMHS03G07 WRMHS20B05 WRMHS08K13 WRMHS07C10 WRMHS25E13 WRMHS33P16 WRMHS05I17 WRMHS10B20 WRMHS21G16 WRMHS17N05 WRMHS28M07 WRMHS30M24 WRMHS29O01 WRMHS28D07 WRMHS28P24 WRMHS30P01 WRMHS06F23 WRMHS26C16 WRMHS27F15 WRMHS30E04 WRMHS28D03 WRMHS30I03 WRMHS02J08 WRMHS30L10 WRMHS12P19 WRMHS05P11 WRMHS30G10 WRMHS40H06 WRMHS26K05 WRMHS27D24 WRMHS39K05 WRMHS27E09 WRMHS26J16 WRMHS27N04 WRMHS06A10 WRMHS30N23 WRMHS27H01 WRMHS26N23 WRMHS28L02  Sequence Start 15062297 15062307 15062308 15062353 15062355 15062459 15062476 15062484 15062485 15062591 15062666 15062684 15062698 15062704 15062733 15062776 15062776 15062791 15062892 15062924 15062929 15062958 15062968 15062990 15063007 15063049 15063056 15063060 15063124 15063171 15063182 15063246 15063271 15063274 15063317 15063331 15063365 15063444 15063484 15063519 15063561 15063706 15063737 15063756  Bit Score 1465 1083 1506 1064 1423 1443 1182 1238 1376 1347 1014 1441 983 1419 1419 1349 1371 1323 1306 1242 1496 1293 931 1262 1496 1256 1544 1314 1037 1271 1223 1447 1395 708 1328 534 1496 1347 1432 1242 1378 1465 1321 1297  Sequence End 15063582 15064039 15060618 15066062 15060464 15065589 15064976 15067903 15061259 15064131 15067826 15061785 15065968 15061091 15065707 15064252 15067954 15068154 15060495 15061856 15061508 15063933 15062259 15065968 15063951 15065214 15066615 15066492 15064617 15065636 15065518 15061358 15067733 15064030 15061549 15065056 15061153 15062135 15065621 15067298 15067280 15063737 15064451 15064747  Bit Chromosome Score 1461 I 1142 I 1306 I 1400 I 1092 I 1443 I 1369 I 1410 I 1574 I 1504 I 1085 I 1376 I 1391 I 1190 I 1354 I 1434 I 1301 I 1476 I 1312 I 1310 I 1297 I 1288 I 1206 I 1400 I 1448 I 1445 I 1483 I 1384 I 1432 I 1465 I 1432 I 1404 I 830 I 1295 I 1312 I 1467 I 1179 I 1000 I 1279 I 1399 I 1338 I 1445 I 1221 I 1456 I  Gap* True True True True True True True True Gap True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 1285 1732 1690 3709 1891 3130 2500 5419 1226 1540 5160 899 3270 1613 2974 1476 5178 5363 2397 1068 1421 975 709 2978 944 2165 3559 3432 1493 2465 2336 1888 4462 756 1768 1725 2212 1309 2137 3779 3719 1551 714 991  86  Fosmid Name WRMHS23O07 WRMHS29G22 WRMHS11P20 WRMHS26D16 WRMHS40E23 WRMHS26B03 WRMHS06F04 WRMHS30F06 WRMHS21J14 WRMHS21I04 WRMHS27J19 WRMHS26O24 WRMHS28N10 WRMHS04O07 WRMHS29N08 WRMHS28A01 WRMHS19F17 WRMHS06I06 WRMHS40C08 WRMHS02I11 WRMHS16G02 WRMHS30I09 WRMHS37H20 WRMHS14H10 WRMHS30B12 WRMHS29L07 WRMHS32C18 WRMHS30P11 WRMHS33O10 WRMHS07J15 WRMHS14F20 WRMHS10N18 WRMHS29N01 WRMHS01C04 WRMHS01F14 WRMHS16B07 WRMHS26B10 WRMHS26A19 WRMHS40O13 WRMHS31P18 WRMHS11F24 WRMHS28G20 WRMHS28P15 WRMHS37M16  Sequence Start 15063832 15063892 15063911 15063912 15063919 15063933 15063966 15063971 15063974 15063992 15064016 15064102 15064122 15064156 15064156 15064170 15064199 15064206 15064217 15064260 15064397 15064406 15064431 15064469 15064489 15064499 15064536 15064561 15064647 15064709 15064728 15064747 15064777 15064780 15064801 15064828 15064879 15064954 15064974 15065026 15065064 15065067 15065081 15065120  Bit Score 1437 1404 1284 1362 1437 1402 1400 1399 1371 1487 1273 1389 1266 1000 1306 1371 1199 1476 1498 1391 1367 1397 1273 1214 616 1260 1495 1480 1535 1280 1328 771 1406 1404 1027 1574 1458 1541 1519 1561 1090 1458 1151 1334  Sequence End 15065684 15061381 15061931 15066205 15062221 15065332 15065904 15065413 15067320 15065905 15062949 15063325 15062087 15063402 15064183 15063394 15062110 15060759 15061578 15066263 15061476 15061621 15067027 15066870 15064833 15060606 15062676 15066768 15066902 15065888 15061123 15064915 15061414 15061444 15066635 15063469 15061119 15065468 15067905 15063948 15060337 15060739 15067415 15061057  Bit Chromosome Score 1461 I 1116 I 1238 I 1214 I 1315 I 1395 I 1469 I 1404 I 1358 I 1352 I 1072 I 1158 I 1360 I 1157 I 1195 I 837 I 1254 I 1389 I 1216 I 1251 I 1423 I 1456 I 1254 I 1254 I 621 I 1317 I 1530 I 1347 I 1511 I 1487 I 1347 I 1319 I 1520 I 1363 I 1369 I 1461 I 1027 I 1341 I 1445 I 1282 I 1367 I 1391 I 1428 I 1247 I  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 1852 2511 1980 2293 1698 1399 1938 1442 3346 1913 1067 777 2035 754 1337 776 2089 3447 2639 2003 2921 2785 2596 2401 344 3893 1860 2207 2255 1179 3605 961 3363 3336 1834 1359 3760 1055 2931 1078 4727 4328 2334 4063  87  Fosmid Name WRMHS13M08 WRMHS28C08 WRMHS30G21 WRMHS26J02 WRMHS02H05 WRMHS28L11 WRMHS24G23 WRMHS29B13 WRMHS36K21 WRMHS28G16 WRMHS18P05 WRMHS36K15 WRMHS04C08 WRMHS27E02 WRMHS35E06 WRMHS38B06 WRMHS03F17 WRMHS29A19 WRMHS12J24 WRMHS33E09 WRMHS36A03 WRMHS40F19 WRMHS26B24 WRMHS10M19 WRMHS26O14 WRMHS35F06 WRMHS16I24 WRMHS32I12 WRMHS31L19 WRMHS09N07 WRMHS17I17 WRMHS34I22 WRMHS28N15 WRMHS40A17 WRMHS39K22 WRMHS35P07 WRMHS30J09 WRMHS02P20 WRMHS29B23 WRMHS40P03 WRMHS05H19 WRMHS28H15 WRMHS05E23 WRMHS10M20  Sequence Start 15065135 15065148 15065156 15065169 15065216 15065223 15065225 15065243 15065270 15065320 15065335 15065361 15065367 15065394 15065417 15065429 15065432 15065500 15065524 15065541 15065582 15065582 15065595 15065664 15065691 15065694 15065703 15065750 15065877 15065879 15065883 15065931 15065963 15065995 15066041 15066064 15066067 15066068 15066078 15066114 15066128 15066138 15066166 15066172  Bit Score 1236 1362 1177 1297 1391 1448 1448 1232 1363 1315 1256 1341 1472 1581 1445 1434 693 1419 1271 1314 1417 368 1345 1485 592 1314 1352 1349 1483 1267 1406 1338 1367 1482 1369 1262 1548 1267 1238 1447 1380 1399 1487 1461  Sequence End 15066716 15065947 15064334 15062117 15063262 15063051 15062560 15067011 15060312 15068109 15063494 15063657 15065930 15062484 15064514 15060659 15060597 15060648 15062222 15063974 15067634 15067634 15064123 15064561 15064815 15063566 15062844 15067905 15067767 15063035 15064704 15068027 15064630 15064346 15067297 15065341 15061078 15060624 15062353 15068023 15063017 15062245 15065051 15068175  Bit Chromosome Score 1371 I 1434 I 1386 I 1158 I 1146 I 1371 I 1371 I 1537 I 1445 I 1391 I 1323 I 1435 I 1240 I 1474 I 1487 I 1277 I 1107 I 1155 I 1288 I 1570 I 1386 I 798 I 1347 I 1290 I 1216 I 1400 I 1450 I 1526 I 1358 I 1249 I 1290 I 1471 I 1448 I 527 I 1284 I 1391 I 1345 I 1284 I 1295 I 1330 I 1166 I 1382 I 1258 I 1491 I  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True Gap True True True True  Size 1581 799 822 3052 1954 2172 2665 1768 4958 2789 1841 1704 915 2910 903 4770 4835 4852 3302 1567 2052 2052 1472 1103 876 2128 2859 2155 1890 2844 1179 2096 1333 1649 1256 726 4989 5444 3725 1909 3111 3893 1115 2003  88  Fosmid Name WRMHS26I12 WRMHS28D22 WRMHS21F22 WRMHS15J19 WRMHS35D04 WRMHS26L05 WRMHS26F18 WRMHS29A09 WRMHS28E23 WRMHS28K16 WRMHS40M22 WRMHS27A09 WRMHS34C18 WRMHS15B18 WRMHS01E13 WRMHS29M22 WRMHS01E18 WRMHS25L09 WRMHS21D07 WRMHS35P03 WRMHS08C18 WRMHS04H15 WRMHS02G10 WRMHS25N24 WRMHS39E08 WRMHS33B10 WRMHS13C23 WRMHS23C20 WRMHS09L19 WRMHS27K21 WRMHS16M01 WRMHS28F14 WRMHS27G23 WRMHS04B23 WRMHS19I08 WRMHS32H16 WRMHS26B23 WRMHS07N21 WRMHS27D20 WRMHS26A14 WRMHS03J10 WRMHS29J23 WRMHS12J19 WRMHS09B22  Sequence Start 15066199 15066201 15066226 15066253 15066328 15066354 15066434 15066455 15066475 15066497 15066499 15066601 15066656 15066675 15066729 15066772 15066784 15066803 15066807 15066811 15066814 15066831 15066865 15066938 15066940 15066948 15066970 15067025 15067085 15067184 15067188 15067223 15067224 15067226 15067247 15067255 15067292 15067327 15067347 15067359 15067362 15067399 15067441 15067453  Bit Score 1323 1279 1356 1448 1229 1445 1423 1526 1314 1310 1448 952 1635 1480 1496 1369 876 1352 1445 1395 1430 1188 1378 1365 573 1602 1328 1415 1236 1535 1591 1197 117 1323 1406 1260 1051 1423 1519 1345 1199 1304 1408 1384  Sequence End 15060827 15063848 15060752 15064767 15064463 15066474 15065363 15061991 15065382 15065675 15062484 15061151 15065090 15063612 15062602 15063229 15067170 15062759 15061351 15064312 15063457 15062096 15063491 15061220 15061887 15061090 15064082 15061901 15062923 15061075 15062584 15065344 15061790 15061572 15064845 15061781 15067869 15065637 15063053 15064062 15066493 15063001 15060302 15063049  Bit Chromosome Score 889 I 1507 I 1195 I 1245 I 965 I 1467 I 1005 I 1221 I 1269 I 1506 I 1330 I 784 I 1454 I 1245 I 1439 I 1360 I 1275 I 1334 I 1114 I 1443 I 1288 I 1218 I 1112 I 959 I 1432 I 1432 I 1415 I 1419 I 1243 I 1387 I 1513 I 1415 I 1332 I 826 I 1472 I 1474 I 1068 I 1397 I 1035 I 1140 I 1251 I 1419 I 1321 I 1232 I  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True Gap True True True True True True True  Size 5372 2353 5474 1486 1865 1473 1071 4464 1093 822 4015 5450 1566 3063 4127 3543 803 4044 5456 2499 3357 4735 3374 5718 5053 5858 2888 5124 4162 6109 4604 1879 5434 5654 2402 5474 577 1690 4294 3297 869 4398 7139 4404  89  Fosmid Name WRMHS29N21 WRMHS21F14 WRMHS13M20 WRMHS11P04 WRMHS12N06 WRMHS25F06 WRMHS36C11 WRMHS35E23 WRMHS05N23 WRMHS27N02 WRMHS30H17 WRMHS28B20 WRMHS09B10 WRMHS09H05 WRMHS28H21 WRMHS30D14 WRMHS29P17 WRMHS12A07 WRMHS27M03 WRMHS26L22 WRMHS29B24 WRMHS28J04 WRMHS27I21 WRMHS34H11 WRMHS24N14 WRMHS27C11 WRMHS05C14 WRMHS33F09 WRMHS02O24 WRMHS27L18 WRMHS28E11 WRMHS07C14 WRMHS28L01 WRMHS33O01 WRMHS27E07 WRMHS25H03 WRMHS26J06 WRMHS36N14 WRMHS27A18 WRMHS09G21 WRMHS05O04 WRMHS20G06 WRMHS28F02 WRMHS27K17  Sequence Start 15067458 15067469 15067497 15067599 15067639 15067659 15067671 15067671 15067732 15067741 15067750 15067766 15067770 15067840 15067843 15067942 15067979 15067980 15068007 15068106 15068175 15068187 15068240 15068250 15069208 1019832 1583274 2423729 5134323 8276628 8288567 8288776 8288901 8288929 8289025 8289113 8289284 8289310 8289491 8289661 8289710 8290294 8290361 8290922  Bit Score 1279 1387 1365 1304 1410 1280 1583 1290 1463 1489 1474 1406 1179 994 1397 1452 1258 1701 1330 1356 1304 1308 1633 1410 1544 285 1031 1389 479 1277 1061 1096 1031 1358 1099 979 769 1062 1173 950 721 619 1140 1138  Sequence End 15060481 15060770 15066741 15066554 15063315 15060618 15066214 15066214 15064847 15065216 15062063 15066878 15063551 15062436 15067064 15065994 15064128 15061755 15061056 15060741 15065938 15063026 15065068 15060735 15062113 1029825 1591824 2414808 5147172 8289451 8291822 8289207 8289806 8292752 8289248 8288369 8290670 8291003 8292160 8288776 8289156 8288413 8288655 8290151  Bit Chromosome Score 1269 I 1133 I 1437 I 1229 I 1310 I 1308 I 1500 I 1295 I 1277 I 1458 I 1496 I 1432 I 1419 I 1328 I 1421 I 1437 I 1277 I 1413 I 1445 I 1168 I 1397 I 1415 I 1334 I 1426 I 1221 I 1441 II 1208 II 372 II 497 II 846 II 1112 II 785 II 1088 II 1177 II 939 II 1109 II 745 II 872 II 1109 II 1074 II 1050 II 970 II 1125 II 1086 II  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True Gap Gap True True True True True True True True True True True True True True True Gap  Size 6977 6699 759 1045 4324 7041 1457 1457 2885 2525 5687 888 4219 5404 779 1948 3851 6225 6951 7365 2237 5161 3172 7515 7095 9993 8550 8921 12849 12823 3255 710 905 3823 1010 744 1386 1693 2669 885 537 1881 1706 771  90  Fosmid Name WRMHS14C19 WRMHS18A15 WRMHS36G15 WRMHS32O02 WRMHS30J14 WRMHS25J22 WRMHS14J07 WRMHS30N02 WRMHS27N01 WRMHS26A22 WRMHS25N02 WRMHS29P08 WRMHS04D12 WRMHS04L18 WRMHS14C24 WRMHS09N15 WRMHS07G06 WRMHS12B22 WRMHS27D15 WRMHS29O17 WRMHS28F16 WRMHS10F07 WRMHS30M08 WRMHS29O06 WRMHS30O15 WRMHS26D19 WRMHS10J06 WRMHS25N16 WRMHS01J10 WRMHS27O10 WRMHS01A22 WRMHS28C02 WRMHS28B15 WRMHS28I09 WRMHS28I06 WRMHS04O13 WRMHS18A02 WRMHS27G11 WRMHS29I09 WRMHS26K12 WRMHS29J15 WRMHS05D22 WRMHS26M08 WRMHS30J24  Sequence Start 8291254 8291281 8291578 8291616 8291671 8291746 8291917 8292125 8292422 8292563 9810445 9815520 9819575 12853740 12859687 13999151 925560 941198 1016830 1019605 3334340 3334408 3334497 3334560 3334570 3334698 3334754 3335314 3335399 3335423 3335506 3335989 3336357 5345583 5349346 7403795 7408508 10224629 10224659 10224688 10224699 10224749 10224752 10224757  Bit Score 1051 798 1079 1103 1072 928 870 1251 1127 1351 204 809 987 1373 1000 1044 1325 414 743 1007 935 496 627 881 1144 933 789 826 717 1103 950 929 1079 1175 1314 1168 1151 1107 965 924 1055 1014 1194 1192  Sequence End 8291991 8302902 8291784 8292472 8292436 8289085 8292717 8289547 8288850 8285140 9807784 9805447 9830375 12845411 12845434 13985246 931569 940921 1019970 1018505 3335287 3336707 3334925 3336331 3336558 3335264 3336423 3334422 3335912 3336189 3336162 3321012 3335532 5355089 5355537 7411651 7400640 10225367 10225435 10225183 10225403 10227299 10237144 10225356  Bit Chromosome Score 1101 II 1417 II 1014 II 1096 II 1219 II 979 II 1072 II 1136 II 1166 II 1363 II 667 II 1179 II 1184 II 948 II 917 II 747 II 693 III 414 III 1081 III 1094 III 1275 III 1044 III 638 III 961 III 1262 III 817 III 966 III 941 III 809 III 1133 III 1024 III 1376 III 1107 III 1011 III 1230 III 1035 III 1273 III 1070 III 1146 III 782 III 592 III 1147 III 1160 III 1212 III  Gap* True True True True True True True True True True True True True True True Gap Gap True Gap True True True True True True True True True Gap True True True True True True True Gap True True True True True True True  Size 737 11621 1125 856 765 2661 800 2578 3572 7423 2661 10073 10800 8329 14253 13905 6009 277 3140 1100 947 2299 428 1771 1988 544 1669 892 513 766 656 14977 825 9506 6191 7856 7868 738 776 587 704 2550 12392 856  91  Fosmid Name WRMHS28B02 WRMHS09O20 WRMHS25E02 WRMHS10F12 WRMHS26O09 WRMHS26M09 WRMHS29F21 WRMHS07B13 WRMHS28D21 WRMHS27B11 WRMHS29E10 WRMHS28K14 WRMHS31J24 WRMHS27P12 WRMHS30D01 WRMHS28F18 WRMHS23P03 WRMHS30G17 WRMHS03O06 WRMHS27H16 WRMHS27G05 WRMHS26K14 WRMHS05K24 WRMHS29L16 WRMHS26K13 WRMHS29B20 WRMHS26G19 WRMHS25P16 WRMHS28O12 WRMHS29O14 WRMHS30E10 WRMHS07B12 WRMHS27F22 WRMHS35E01 WRMHS37N07 WRMHS19L11 WRMHS22J02 WRMHS27C01 WRMHS09I08 WRMHS27J09 WRMHS29G24 WRMHS11D07 WRMHS30P10 WRMHS28I24  Sequence Start 10224817 10224839 10224910 10225056 10225090 10225345 10225360 10225370 10225380 10225384 10225386 10225415 10225471 10225489 10225594 10225998 13586681 2828801 2828877 2829261 2829327 2829453 2829819 2829960 2829977 2830085 2830158 2830263 2830276 2830528 2830537 3203573 3207894 3212543 3213005 3218674 4406770 4416454 4416462 4416760 4417519 4418883 4419952 4426473  Bit Score 737 641 739 880 501 477 1055 1062 1018 1199 928 1085 1275 1188 1399 983 274 1110 856 1182 1273 989 928 813 850 1014 1208 861 1037 1035 1216 453 920 1003 1118 1212 826 1417 1059 1351 1037 830 1245 1218  Sequence End 10225320 10227078 10225361 10225390 10225546 10224706 10224651 10224616 10224677 10224617 10224536 10224665 10224701 10224687 10224620 10224766 13571871 2829975 2829851 2829951 2830520 2829851 2828909 2830504 2829339 2828859 2828804 2828950 2828912 2828800 2829425 3208551 3211817 3207639 3203655 3207817 4419436 4417388 4420206 4427259 4419859 4427899 4417825 4419564  Bit Chromosome Score 1072 III 1037 III 1153 III 1098 III 913 III 819 III 1157 III 1225 III 1323 III 1122 III 1223 III 1208 III 1053 III 1037 III 1230 III 1155 III 1397 III 1253 IV 1068 IV 592 IV 1234 IV 961 IV 1037 IV 1005 IV 1031 IV 1081 IV 422 IV 979 IV 1024 IV 1105 IV 1101 IV 1245 IV 1116 IV 976 IV 1040 IV 1055 IV 839 IV 1393 IV 1133 IV 1014 IV 1005 IV 1491 IV 1160 IV 1173 IV  Gap* True True True True True True True True True True Gap True True True True True True True True True True True True True True True True True True Gap True Gap True True True True True True True True True True True True  Size 503 2239 766 937 456 639 675 660 743 767 850 730 770 802 974 1232 14810 1174 974 445 1193 856 910 544 638 1226 1354 1313 1364 1728 1112 4978 3923 4904 9350 10857 12666 934 3744 10499 2340 9016 2127 6909  92  Fosmid Name WRMHS26C03 WRMHS13B24 WRMHS33O12 WRMHS26G01 WRMHS11N12 WRMHS01F24 WRMHS26G18 WRMHS03J11 WRMHS23N18 WRMHS07A08 WRMHS01C05 WRMHS01C11 WRMHS05D04 WRMHS30D17 WRMHS30N22 WRMHS09H13 WRMHS40N10 WRMHS12K21 WRMHS27B15 WRMHS26G04 WRMHS28L12 WRMHS08M24 WRMHS29H21 WRMHS27I17 WRMHS07B04 WRMHS09L16 WRMHS20D20 WRMHS29K23 WRMHS27J05 WRMHS11O15 WRMHS21P14 WRMHS09K06 WRMHS12K16 WRMHS10F23 WRMHS11B11 WRMHS26N21 WRMHS27E03 WRMHS28D12 WRMHS30G19 WRMHS26C08 WRMHS28K11 WRMHS30F05 WRMHS05D17 WRMHS08F14  Sequence Start 4427142 4432064 4432392 6676544 8566672 8577296 8577318 8577977 8579689 8581318 8586990 9046016 9046948 9046949 9046961 9047501 9047702 9047742 9047779 11072934 11074244 11074298 11074300 11074691 11074770 11074901 11074906 11075476 11075592 11078091 11081059 12161244 12322053 12731613 13361904 13549405 13549405 13549405 13549405 13549405 13549507 13549507 16963438 16963446  Bit Score 787 1138 1483 497 1092 695 785 652 752 861 1301 1077 880 1138 1190 835 1122 920 1086 734 852 433 418 909 811 926 941 483 1007 665 1533 1038 1284 856 894 180 174 180 180 178 180 180 534 843  Sequence End 4417746 4419291 4417858 6683317 8577448 8580973 8583057 8575493 8589359 8575518 8575670 9047779 9047686 9052865 9047703 9045921 9035931 9047089 9045999 11080781 11075679 11072764 11072060 11072805 11072972 11072945 11072516 11072807 11072509 11072505 11072938 12170138 12309209 12730795 13359934 13549507 13549507 13549507 13549507 13549507 13549405 13549405 16976159 16973114  Bit Chromosome Score 481 IV 1090 IV 1284 IV 795 IV 555 IV 867 IV 508 IV 979 IV 1197 IV 941 IV 1033 IV 983 IV 985 IV 1443 IV 1144 IV 900 IV 950 IV 1040 IV 1214 IV 1236 IV 695 IV 905 IV 905 IV 1005 IV 695 IV 542 IV 739 IV 662 IV 606 IV 351 IV 741 IV 1225 IV 1360 IV 811 IV 1306 IV 180 IV 174 IV 180 IV 180 IV 180 IV 180 IV 180 IV 843 IV 1251 IV  Gap* True True True Gap True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True Gap True True True True True True True True True  Size 9396 12773 14534 6773 10776 3677 5739 2484 9670 5800 11320 1763 738 5916 791 1580 11771 624 1780 7847 1435 1534 2240 1886 1798 1956 2390 2669 3083 5586 8121 8894 12844 818 1970 102 102 102 102 102 102 102 12721 9668  93  Fosmid Name WRMHS16A17 WRMHS35N22 WRMHS27M21 WRMHS12K18 WRMHS26I18 WRMHS30K17 WRMHS27I09 WRMHS08P13 WRMHS11A23 WRMHS28K18 WRMHS05J21 WRMHS26M20 WRMHS30D22 WRMHS01C16 WRMHS25K21 WRMHS28P01 WRMHS29O09 WRMHS26L11 WRMHS11I07 WRMHS30P22 WRMHS29A22 WRMHS04K15 WRMHS30J04 WRMHS28I08 WRMHS06F24 WRMHS29D20 WRMHS27G24 WRMHS27M06 WRMHS28H13 WRMHS30C13 WRMHS26M16 WRMHS12G17 WRMHS08G22 WRMHS29I08 WRMHS26P10 WRMHS26B05 WRMHS29D07 WRMHS04F04 WRMHS05E04 WRMHS04K11 WRMHS03G15 WRMHS27A23 WRMHS26B13 WRMHS04M07  Sequence Start 16976924 253541 265076 265199 265230 265580 265652 265690 265725 265846 266023 266075 266091 266133 266170 266171 266181 266204 266302 266406 266462 266579 266761 266764 266766 266776 266806 266827 266848 266867 266911 267007 267191 267234 1103175 1104561 1104563 1104592 1104632 1104701 1104736 1104921 1105022 1107008  Bit Score 1382 1303 1338 1168 1099 1240 909 1068 846 1101 922 1284 1234 1158 869 1127 1057 1136 1053 1079 1123 963 595 1011 893 1107 1140 1096 1022 1109 1155 1358 957 977 444 669 621 678 610 420 424 1254 715 966  Sequence End 16963095 265927 267244 266803 267148 266363 266498 267008 266240 266288 266808 266870 267134 274687 266439 266805 267066 266560 266827 266803 265591 267228 266897 266104 265730 265308 265881 266118 266163 266111 265920 266067 266362 266380 1105054 1105573 1104954 1105048 1105054 1105034 1105053 1102436 1104561 1104565  Bit Chromosome Score 1127 IV 1168 V 1304 V 1062 V 708 V 1083 V 449 V 1142 V 1279 V 1199 V 1053 V 924 V 1199 V 1245 V 1158 V 955 V 1088 V 632 V 1151 V 1086 V 1175 V 918 V 1118 V 1042 V 1035 V 1086 V 1081 V 1114 V 1144 V 1212 V 850 V 1136 V 1155 V 1206 V 531 V 1142 V 636 V 702 V 747 V 697 V 773 V 828 V 743 V 730 V  Gap* Gap True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 13829 12386 2168 1604 1918 830 846 1318 859 1017 785 603 1043 8554 1014 720 885 764 880 1015 871 649 976 660 1036 1468 925 691 660 810 991 940 829 854 1879 1012 395 456 422 333 317 2485 476 2443  94  Fosmid Name WRMHS03E19 WRMHS16D06 WRMHS29A06 WRMHS08P06 WRMHS04H16 WRMHS29H06 WRMHS39I18 WRMHS31J16 WRMHS28L06 WRMHS27O21 WRMHS30M09 WRMHS09L01 WRMHS30B07 WRMHS27B23 WRMHS01E17 WRMHS12F15 WRMHS27K09 WRMHS09I15 WRMHS04J23 WRMHS26N19 WRMHS04D03 WRMHS29M05 WRMHS07G19 WRMHS25H15 WRMHS27E18 WRMHS09F01 WRMHS15N24 WRMHS05N15 WRMHS28C15 WRMHS27N12 WRMHS04B17 WRMHS28E20 WRMHS25M08 WRMHS25O06 WRMHS29J12 WRMHS26M07 WRMHS17F06 WRMHS37J02 WRMHS38N21 WRMHS37G01 WRMHS06F14 WRMHS08C07 WRMHS16J09 WRMHS23A01  Sequence Start 1112848 2216373 3095779 3099004 3101488 5074561 5269669 5272798 5282864 5283316 5283600 5283833 5283834 5283892 5283932 5284471 5284892 5285715 6174365 6181556 6186920 6189114 6190135 6939200 6939473 6939527 10595141 10602831 10604666 10605999 10606652 10606751 10606759 10607273 10614200 10619057 10620243 14870620 14873486 14880476 14881987 14882506 14891515 17120467  Bit Score 521 1496 1155 294 1199 795 1489 1465 865 1325 1127 588 1101 1066 859 918 878 182 935 1203 1291 896 1454 972 1072 970 1417 1027 1363 235 719 1267 1118 835 1101 1495 1445 1358 1543 1382 1402 1266 785 309  Sequence End 1104561 2205888 3108191 3108419 3092204 5073913 5284321 5284092 5285622 5281267 5291013 5287702 5284540 5290680 5296939 5270602 5286618 5274840 6183415 6174319 6174349 6174357 6175519 6939527 6936934 6938067 10608706 10608692 10607655 10605873 10615167 10607691 10609591 10607410 10606800 10605996 10608031 14880314 14879183 14883133 14879026 14892224 14883152 17122924  Bit Chromosome Score 712 V 1321 V 1166 V 1249 V 894 V 1177 V 898 V 763 V 1009 V 1266 V 1443 V 1236 V 1077 V 1271 V 1099 V 1358 V 965 V 1157 V 1199 V 821 V 942 V 972 V 833 V 1014 V 1131 V 1253 V 1075 V 1055 V 1232 V 265 V 512 V 1315 V 1166 V 1218 V 1184 V 1219 V 891 V 523 V 1251 V 1291 V 1434 V 1417 V 1216 V 926 V  Gap* True True True True True True True True True True True True True True True True Gap True True True True True True True True True True True True True Gap True True True True True True True True True True True True Gap  Size 8287 10485 12412 9415 9284 573 14652 11294 2758 2049 7413 3869 796 6788 13007 13869 1726 10875 9050 7237 12571 14757 14616 802 2539 1460 13565 5861 2989 142 8515 940 2832 1083 7400 13061 12212 9694 5697 2657 2961 9718 8363 2457  95  Fosmid Name WRMHS01E09 WRMHS37I14 WRMHS36B04 WRMHS15J03 WRMHS08N08 WRMHS35F02 WRMHS30J23 WRMHS32K09 WRMHS01B16 WRMHS36F10 WRMHS03M12 WRMHS14G05 WRMHS15L11 WRMHS02D14 WRMHS25A23 WRMHS07M16 WRMHS30A05 WRMHS26A11 WRMHS28K02 WRMHS08B22 WRMHS29E03 WRMHS27D12 WRMHS27N06 WRMHS10L04 WRMHS12J05 WRMHS01I16 WRMHS05I06 WRMHS29K04 WRMHS29M01 WRMHS04B20 WRMHS28M13 WRMHS01A17 WRMHS29O11 WRMHS28J03 WRMHS06L24 WRMHS08N07 WRMHS29L01 WRMHS28K06 WRMHS28H10 WRMHS30D18 WRMHS01M01 WRMHS27K02 WRMHS01C18 WRMHS29P15  Sequence Start 17122329 17122414 17122586 17122616 17122891 17123231 17123348 17123592 17123722 17123873 17123899 17124020 17124526 17130274 18242128 102328 109734 109861 110151 1105719 110767 110794 110852 110928 111462 111623 111940 112256 112319 112398 112657 112792 112803 112828 113259 113368 113444 113449 113453 113784 113793 113930 113948 113968  Bit Score 1369 1064 1247 1260 1188 1175 1378 1430 863 959 1149 1122 1362 1254 935 645 143 1040 470 1086 929 1147 987 630 800 658 651 902 555 1271 1099 1075 765 1090 1221 549 774 904 905 946 1218 959 625 913  Sequence End 17123879 17124620 17123693 17123319 17123155 17124343 17124302 17122467 17123226 17123320 17123121 17122460 17122470 17137195 18252831 114186 114000 111216 110419 113657 113859 107537 114048 113550 110997 112947 112447 113118 113377 113559 114038 113834 113973 112871 113964 112979 112769 114211 114233 111645 110533 113229 110295 112754  Bit Chromosome Score 1286 V 1171 V 1197 V 1206 V 1164 V 1205 V 1310 V 1421 V 761 V 1042 V 1334 V 1194 V 1262 V 1286 V 440 V 1005 X 1096 X 455 X 475 X 966 X 952 X 989 X 874 X 872 X 800 X 1026 X 671 X 1171 X 1205 X 911 X 1166 X 1210 X 911 X 1064 X 1118 X 381 X 784 X 826 X 885 X 1085 X 979 X 1013 X 1003 X 885 X  Gap* True True True True True True True True True True True True True Gap True True True True True True True True True True True Gap True True True True True True True True True True True True True True True True True True  Size 1550 2206 1107 703 1041 1112 954 1125 496 544 778 1560 2056 6921 10703 11858 4266 1355 268 3078 3092 3257 3196 2622 465 1324 507 862 1058 1161 1381 1042 1170 1298 681 389 675 762 780 2139 3260 720 3653 1214  96  Fosmid Name WRMHS26F06 WRMHS27M17 WRMHS26N06 WRMHS30M16 WRMHS30L09 WRMHS30C10 WRMHS11O04 WRMHS28K21 WRMHS28E22 WRMHS30D15 WRMHS27J21 WRMHS09J03 WRMHS35A02 WRMHS29K11 WRMHS02P02 WRMHS30H02 WRMHS29C06 WRMHS05K20 WRMHS09D15 WRMHS05E06 WRMHS29L22 WRMHS30H10 WRMHS09L21 WRMHS30N21 WRMHS12P18 WRMHS29D10 WRMHS30C16 WRMHS28L23 WRMHS26G12 WRMHS07B17 WRMHS28A02 WRMHS10D11 WRMHS30E03 WRMHS29E20 WRMHS01P07 WRMHS29C20 WRMHS16J06 WRMHS05I19 WRMHS26C06 WRMHS29J19 WRMHS39G13 WRMHS25I03 WRMHS12J20 WRMHS29B01  Sequence Start 113973 113973 114103 114160 114223 114223 114226 118493 290240 292764 292766 293000 293024 293178 293483 293671 293899 293931 294013 294038 294121 294175 294245 294484 294603 294861 1631699 1633167 1633184 1634659 1636522 1636612 1636704 1636965 1637308 1637471 1637483 1638044 1638394 1638719 1654843 3484059 5056582 7069760  Bit Score 586 1044 878 1059 918 1214 959 1214 1173 1158 1310 1059 1194 1092 435 1400 1229 1105 1011 963 1110 1358 961 1114 1195 1040 1312 1142 1273 1050 968 893 1175 846 1005 1062 990 795 1347 874 1382 1371 1295 1295  Sequence End 113082 111665 113509 112603 113223 112067 112765 113432 294505 294084 293879 294590 294506 293567 292734 294595 295191 294828 294792 294750 293423 305320 294823 293076 293111 292074 1636850 1637467 1637226 1637127 1637425 1646541 1636917 1641608 1624757 1634662 1628890 1645055 1640095 1636673 1641638 3481743 5067254 7078393  Bit Chromosome Score 540 X 1227 X 824 X 1085 X 931 X 987 X 782 X 1327 X 1229 X 1426 X 1334 X 1177 X 1253 X 1295 X 477 X 1175 X 1338 X 809 X 1236 X 1151 X 966 X 1456 X 1155 X 1260 X 1358 X 1325 X 1201 X 1249 X 427 X 1190 X 1122 X 1380 X 1158 X 1367 X 1310 X 1291 X 1454 X 1155 X 1391 X 1085 X 1170 X 739 X 1356 X 1009 X  Gap* True True True True True True True True Gap True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 891 2308 594 1557 1000 2156 1461 5061 4265 1320 1113 1590 1482 957 749 924 1292 897 779 712 698 11145 578 1408 1492 2787 5151 4300 4042 2468 903 9929 1171 4643 12551 2809 8593 7011 1701 2046 13205 2316 10672 8633  97  Fosmid Name WRMHS26C22 WRMHS26L18 WRMHS25H06 WRMHS30I08 WRMHS28E18 WRMHS28O18 WRMHS30E07 WRMHS30D20 WRMHS29H12 WRMHS26I02 WRMHS08M22 WRMHS26O12 WRMHS05N10 WRMHS29A08 WRMHS26I09 WRMHS09L13 WRMHS05C19 WRMHS30O12 WRMHS30H01 WRMHS08F12 WRMHS28P18 WRMHS29D02 WRMHS07J02 WRMHS05E14 WRMHS27G20 WRMHS30K11 WRMHS02N16 WRMHS09A06 WRMHS29G21 WRMHS28M11 WRMHS12C22 WRMHS03G08 WRMHS28C12 WRMHS30C05 WRMHS27N03 WRMHS29O04 WRMHS30K23 WRMHS07K01 WRMHS03A08 WRMHS27D17 WRMHS28L24 WRMHS33E15 WRMHS28M12 WRMHS29O21  Sequence Start 7076564 7077464 7077473 7077475 7077476 7077476 7077476 7077477 7077488 7077513 7077516 7077516 7077544 7077549 7077556 7077561 7077563 7077563 7077567 7077568 7077568 7077570 7077576 7077584 7077586 7077597 7077616 7077616 7077620 7077621 7077623 7077629 7077634 7077644 7077676 7077689 7077689 7077691 7077694 7077700 7077710 7077734 7077817 7077841  Bit Score 1476 612 466 699 802 715 854 660 361 446 385 420 460 307 643 691 575 876 887 449 725 621 486 464 813 414 625 375 789 640 481 372 791 907 878 682 1040 850 815 652 717 784 761 898  Sequence End 7079148 7077813 7079038 7079185 7078541 7079029 7079109 7079153 7078649 7078944 7078473 7079116 7079185 7079145 7078349 7078468 7078524 7078621 7079045 7078094 7079161 7078286 7079216 7078613 7079208 7078380 7078655 7079129 7078120 7078384 7079029 7078228 7079201 7079029 7078285 7079090 7079121 7078843 7078769 7078286 7078309 7078275 7079161 7078908  Bit Chromosome Score 928 X 339 X 747 X 1064 X 952 X 952 X 713 X 942 X 804 X 436 X 617 X 726 X 758 X 974 X 715 X 922 X 734 X 1016 X 961 X 592 X 1026 X 872 X 896 X 769 X 970 X 918 X 651 X 837 X 854 X 905 X 446 X 388 X 1040 X 929 X 876 X 959 X 981 X 972 X 656 X 843 X 922 X 1003 X 974 X 1011 X  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 2584 331 1565 1710 1065 1553 1633 1676 1161 1431 957 1600 1641 1596 793 907 961 1058 1478 526 1593 716 1640 1029 1622 783 1039 1513 749 763 1406 599 1567 1385 698 1401 1432 1152 1075 586 599 858 1344 1067  98  Fosmid Name WRMHS27C06 WRMHS27P08 WRMHS29I23 WRMHS06H02 WRMHS05M24 WRMHS26O13 WRMHS27D14 WRMHS21D06 WRMHS25M03 WRMHS30N11 WRMHS06L08 WRMHS30H03 WRMHS27J12 WRMHS28H14 WRMHS26H17 WRMHS29F01 WRMHS30E09 WRMHS29O10 WRMHS07O14 WRMHS40G24 WRMHS30N20 WRMHS29G13 WRMHS29B03 WRMHS04G13 WRMHS30I20 WRMHS29G09 WRMHS30A10 WRMHS27F21 WRMHS29C23 WRMHS26G15 WRMHS29C13 WRMHS29B09 WRMHS28N11 WRMHS27H04 WRMHS28D16 WRMHS08C09 WRMHS29C09 WRMHS30H04 WRMHS27J02 WRMHS27C12 WRMHS30F14 WRMHS28P02 WRMHS08I13 WRMHS28A22  Sequence Start 7077844 7077859 7077890 7077952 7077952 7077991 7078014 7078033 7078072 7078114 7078136 7078145 7078155 7078172 7078184 7078191 7078192 7078207 7078212 7078215 7078226 7078241 7078268 7078293 7078331 7078358 7078361 7078375 7078380 7078386 7078393 7078395 7078420 7078452 7078456 7078486 7078490 7078494 7078507 7078552 7078572 7078599 7078630 7078653  Bit Score 902 704 867 845 654 374 676 523 839 843 776 721 315 756 832 822 848 636 728 763 464 863 612 571 998 662 1138 955 857 913 939 857 737 833 885 368 891 758 926 902 512 688 828 721  Sequence End 7078545 7078092 7079062 7078523 7079209 7078265 7079183 7083641 7078368 7078973 7079024 7081590 7078375 7079205 7079062 7079048 7078206 7079130 7078341 7077548 7078364 7078372 7078448 7077532 7079183 7079173 7077476 7077612 7077488 7077536 7079216 7079021 7079162 7078769 7079084 7078785 7078689 7079121 7079109 7078955 7078689 7079066 7077605 7078985  Bit Chromosome Score 907 X 878 X 1055 X 905 X 876 X 379 X 773 X 1042 X 1098 X 1048 X 739 X 981 X 326 X 937 X 560 X 1007 X 909 X 776 X 1059 X 399 X 989 X 865 X 833 X 401 X 1018 X 981 X 1009 X 712 X 795 X 470 X 1000 X 977 X 981 X 883 X 902 X 793 X 972 X 926 X 863 X 728 X 917 X 946 X 388 X 1050 X  Gap* True True True True True True True True True True True True True True True True True True True True True True True True Gap True True True True True True True True True True True True True True True True True True True  Size 701 908 1172 773 1257 274 1169 5608 1177 859 888 3445 220 1033 878 857 1358 923 1221 667 986 1216 907 761 852 815 885 763 892 850 823 755 742 1032 628 299 1196 627 685 791 1089 780 1025 989  99  Fosmid Name WRMHS26B17 WRMHS29G07 WRMHS27L20 WRMHS08G14 WRMHS25K14 WRMHS27C15 WRMHS03D17 WRMHS26I07 WRMHS29H03 WRMHS26P13 WRMHS08O08 WRMHS05H04 WRMHS28L22 WRMHS26B18 WRMHS30B15 WRMHS29B02 WRMHS27L02 WRMHS27C09 WRMHS30F24 WRMHS04H18 WRMHS10B15 WRMHS26B20 WRMHS19B04 WRMHS26J14 WRMHS26K15 WRMHS29A04 WRMHS11P23 WRMHS29H05 WRMHS28J02 WRMHS28J24 WRMHS29I20 WRMHS09F13 WRMHS30P17 WRMHS27F20 WRMHS10K03 WRMHS28J16 WRMHS34G24 WRMHS28K13 WRMHS27G08 WRMHS35M05 WRMHS08N23 WRMHS29G23 WRMHS12P17 WRMHS28C16  Sequence Start 7078656 7078690 7078729 7078759 7078801 7078809 7078811 7078832 7078833 7078852 7078856 7078862 7078896 7078912 7078920 7078941 7078951 7078955 7078955 7078973 7079049 7079051 7079063 7079084 7079097 7079102 7079105 7079108 7079111 7079136 7079141 7079185 7079185 7079195 7079197 7079216 7082353 7082823 7086042 7092173 9177559 9179382 9180690 9185102  Bit Score 551 468 970 494 787 952 725 436 361 424 593 375 793 952 1085 296 946 924 1026 752 872 1099 813 756 1079 351 926 518 747 710 809 664 977 902 745 918 1330 1096 555 804 1330 1378 1256 1011  Sequence End 7079046 7079136 7077756 7079105 7077680 7077532 7077490 7077625 7079090 7079213 7077986 7078591 7077823 7077584 7078001 7077952 7077584 7078012 7077858 7077487 7077584 7077984 7078506 7077616 7077730 7077700 7078353 7078012 7077476 7077570 7078192 7078633 7078274 7078141 7075583 7078450 7077585 7077535 7077680 7077556 9186961 9186261 9188463 9186693  Bit Chromosome Score 556 X 939 X 761 X 479 X 778 X 909 X 865 X 669 X 381 X 564 X 472 X 387 X 730 X 472 X 959 X 778 X 712 X 728 X 891 X 448 X 645 X 804 X 756 X 327 X 802 X 405 X 647 X 765 X 616 X 523 X 865 X 691 X 922 X 549 X 1426 X 972 X 641 X 957 X 148 X 220 X 715 X 1162 X 667 X 1166 X  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True Gap True True True True True True True True True True True True  Size 390 446 973 404 1121 1277 1321 1207 257 361 870 271 1073 1328 919 989 1367 943 1097 1486 1465 1067 557 1468 1367 1402 752 1096 1635 1566 949 552 911 1054 3614 766 4768 5288 8362 14617 9402 6879 7773 1591  100  Fosmid Name WRMHS28B07 WRMHS10C11 WRMHS29E07 WRMHS29D22 WRMHS05G04 WRMHS28C01 WRMHS27G14 WRMHS05M07 WRMHS03M15 WRMHS26D21 WRMHS09A21 WRMHS05I02 WRMHS21E21 WRMHS10K08 WRMHS30P08 WRMHS26M06 WRMHS26N05 WRMHS27M20 WRMHS28N09 WRMHS27E10 WRMHS29M23 WRMHS09A01 WRMHS30P16 WRMHS07E13 WRMHS01L16 WRMHS01P05 WRMHS29J20 WRMHS07H23 WRMHS29N07 WRMHS25G19 WRMHS30B08 WRMHS10K13 WRMHS28F06 WRMHS10I16 WRMHS26L17 WRMHS27N08 WRMHS25I19 WRMHS01B04 WRMHS30L22 WRMHS11O08 WRMHS28G08 WRMHS29F22 WRMHS04P12 WRMHS29J24  Sequence Start 9185512 9185870 9186487 9186525 9186832 9186867 9186893 9186900 9188495 9188521 9188542 9188597 9189786 9201031 11440660 11442711 11442776 11442991 11443569 11449622 11774695 11782637 11784130 11786499 11786520 11786703 11786906 11787322 11788323 11788999 11789080 11789205 11789471 11880024 11887402 11887468 11887766 12277629 14385908 14897726 14908076 14908470 14909340 14909764  Bit Score 1083 1158 950 941 924 1103 1066 782 603 1214 802 547 963 1219 1112 556 1079 983 1112 1321 1338 1410 1393 1096 1099 1090 636 1136 924 1038 1273 1149 981 1149 1079 891 455 549 1339 1133 880 1020 896 828  Sequence End 9186525 9199097 9175909 9176725 9180904 9185096 9179506 9186035 9185889 9186670 9186270 9174816 9187945 9186055 11443423 11443120 11449635 11446910 11433821 11442968 11787198 11786833 11787157 11789170 11788900 11789146 11781289 11774354 11792461 11788146 11786155 11786492 11788463 11888107 11887851 11887914 11879940 12278100 14394760 14909123 14908848 14908746 14908095 14908255  Bit Chromosome Score 948 X 1424 X 1517 X 1341 X 1312 X 1367 X 1136 X 1018 X 1009 X 959 X 1048 X 1208 X 992 X 1074 X 1053 X 571 X 1118 X 1247 X 1421 X 832 X 453 X 1037 X 1188 X 728 X 894 X 992 X 1351 X 1168 X 1088 X 1160 X 1312 X 1256 X 985 X 1136 X 682 X 992 X 1376 X 385 X 1354 X 736 X 1057 X 898 X 905 X 595 X  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 1013 13227 10578 9800 5928 1771 7387 865 2606 1851 2272 13781 1841 14976 2763 409 6859 3919 9748 6654 12503 4196 3027 2671 2380 2443 5617 12968 4138 853 2925 2713 1008 8083 702 722 7826 334 8852 11397 772 1100 1245 1509  101  Fosmid Name WRMHS26L19 WRMHS29L21 WRMHS29O12 WRMHS28P11 WRMHS20J18 WRMHS30F07 WRMHS12P15 WRMHS27B04 WRMHS27O08 WRMHS30N19 WRMHS30B06 WRMHS30B14 WRMHS30B16 WRMHS06A16 WRMHS29F07 WRMHS27H13 WRMHS10C24 WRMHS29L23 WRMHS28O21 WRMHS01C20 WRMHS12B18 WRMHS02M08 WRMHS28L03 WRMHS27B16 WRMHS28H18 WRMHS28C21 WRMHS28N05 WRMHS30G24 WRMHS26M23 WRMHS27K10 WRMHS30O11 WRMHS19N20 WRMHS26B12 WRMHS28E02 WRMHS30L07 WRMHS05L08 WRMHS29K20 WRMHS30F01 WRMHS28M19 WRMHS01M03 WRMHS06G08 WRMHS29H10 WRMHS29B19  Sequence Start 14910973 14914609 15807564 15807607 15807671 15807906 15808075 15808153 15808220 15808253 15808303 15808324 15808342 15808379 15808410 15808547 15808583 15808646 15808687 15808731 15808946 15808997 15808999 15809021 15809124 15809139 15809145 15809527 15809571 15809712 15810128 15810499 15810687 15810753 15810898 15810913 15811012 15811030 15811040 15811049 15811082 15811092 17317027  Bit Score 1365 1223 737 898 717 791 848 959 641 1083 1055 1094 390 1066 691 1016 1099 654 1075 361 994 898 859 920 1040 1042 968 983 1267 833 1240 702 477 424 1035 682 603 1048 215 664 1131 828 1338  Sequence End 14908120 14908208 15810300 15808639 15809565 15808465 15809972 15811087 15808611 15808960 15811090 15808726 15811026 15808676 15810946 15807557 15808986 15807885 15807772 15808296 15807952 15807660 15808288 15808310 15808361 15808126 15808429 15807574 15807938 15808566 15809166 15811029 15810974 15811035 15809661 15808292 15808488 15810266 15810909 15808413 15808420 15810304 17305399  Bit Chromosome Score 769 X 1027 X 955 X 1018 X 695 X 785 X 1210 X 767 X 647 X 1007 X 994 X 1114 X 433 X 1134 X 990 X 721 X 1216 X 1053 X 874 X 752 X 994 X 900 X 1000 X 808 X 1177 X 1127 X 1048 X 992 X 1110 X 987 X 1282 X 937 X 472 X 440 X 1199 X 739 X 684 X 1050 X 231 X 691 X 1000 X 1011 X 1367 X  Gap* True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True  Size 2853 6401 2736 1032 1894 559 1897 2934 391 707 2787 1008 2684 1159 2536 990 1086 761 915 255 994 1337 711 711 763 1013 716 1953 1633 1146 962 530 287 282 1237 2621 2524 764 131 2636 2662 788 11628  102  Fosmid Name  WRMHS24E10 WRMHS18I10 WRMHS14E15 WRMHS08C19 WRMHS30H08 WRMHS28B17 WRMHS27J24 WRMHS25K06 WRMHS21F19 WRMHS28I03 WRMHS34N02 WRMHS37H02 WRMHS23K12 WRMHS04J20 WRMHS36L17 WRMHS13A22 WRMHS04B11 WRMHS37O08 WRMHS35D22 WRMHS08B20 WRMHS21B11 WRMHS36O17 WRMHS24K17 WRMHS31I09 WRMHS02N11 WRMHS08B04 WRMHS07P22 WRMHS39D23 WRMHS14L12 WRMHS06C05 WRMHS21L14 WRMHS25G15 WRMHS06H10 WRMHS28E14 WRMHS37I22 WRMHS12N14 WRMHS17J20 WRMHS27C08 WRMHS32P16 WRMHS11L24 WRMHS17K18 WRMHS10F09  Sequence Start  11425095 11428940 11429691 1822301 1017641 1018398 1019269 7444692 4110160 4112258 4201075 9065943 10321619 13509925 15927960 2177671 2379160 2423815 2435713 2444577 2579350 2579713 2593743 2598479 3294263 3329209 3330634 3412397 3620906 3889516 8823814 8832890 8851961 8900602 8905261 8918536 14778352 15899332 16169358 16922909 16977321 17123120  Bit Score  1264 1304 1321 867 108 743 1107 311 1550 1301 1386 802 1275 791 1408 1332 961 531 1258 1206 183 1384 1251 1367 1376 1452 1212 1031 351 1424 1465 1463 1014 1426 1304 1194 150 1362 1411 909 1443 1164  Sequence End  11489948 11492586 11486236 1940449 10225476 10225375 10225235 926690 4187837 4189456 4121802 14248025 10950551 3401166 17171723 18429864 2108446 2569999 2578303 2586241 2429956 2431166 2445406 2451497 3401684 5663665 3265189 766562 7673722 17814520 8907151 8907816 8927018 8823932 8819862 8846586 15137122 17170843 16226844 16978077 16922172 17428106  Bit Score  556 981 1269 1365 416 1254 913 1415 497 1315 1360 1002 1179 1358 351 819 294 1234 1336 392 582 435 532 1195 412 1386 933 1363 1179 496 1066 1358 1197 1406 1334 1315 444 159 983 1399 647 1330  Chromosome  I I I II III III III III IV IV IV IV IV IV IV V V V V V V V V V V V V V V V V V V V V V V V V V V V  Gap*  True True True Gap True True True True True True True True True True Gap True True True True True True True True True True True True True True True True True True True True True True True Gap True Gap True  Size  64853 63646 56545 118148 9207835 9206977 9205966 6518002 77677 77198 79273 5182082 628932 10108759 1243763 16252193 270714 146184 142590 141664 149394 148547 148337 146982 107421 2334456 65445 2645835 4052816 13925004 83337 74926 75057 76670 85399 71950 358770 1271511 57486 55168 55149 304986  103  Fosmid Name WRMHS32P01 WRMHS30G09 WRMHS07M20 WRMHS22J12 WRMHS30D09 WRMHS25P09 WRMHS36C21 WRMHS22O21 WRMHS07G03 WRMHS17K20 WRMHS35B04 WRMHS21K01 WRMHS04K04 WRMHS21L15 WRMHS25B02 WRMHS18K20 WRMHS18C13 WRMHS02D11 WRMHS18P17 WRMHS10A21 WRMHS15C05 WRMHS40P12 WRMHS14E12 WRMHS01C19 WRMHS32N06 WRMHS14A13  Sequence Start 17405279 17413321 17423961 17428220 17433187 17505742 17507209 17514137 17524664 17525688 17530231 17551073 17787850 18165232 18177783 18265458 19294299 19356576 1736033 1744670 1744733 1789255 1806153 1902224 1961105 10285262  Bit Score 695 893 1330 1264 1513 584 429 1254 1397 1256 1334 1057 569 1432 1293 1214 628 313 1400 1199 1467 1443 1378 1290 1301 1517  Sequence End 17501053 17509035 17520732 17122853 17529165 17429242 17262329 17421325 17426227 17426996 17427209 17293592 17848799 18262933 18268914 18175693 19350053 19301218 1808657 1811070 1815043 1728594 1737317 1960431 1905094 9072085  Bit Chromosome Score 1459 V 918 V 1447 V 911 V 1393 V 708 V 636 V 1153 V 1166 V 1234 V 1404 V 508 V 60.2 V 1264 V 1450 V 889 V 1356 V 1098 V 1218 X 1487 X 1362 X 1290 X 1286 X 1482 X 1186 X 1511 X  Gap* True True True True True True True True True True True True Gap True True True Gap True True True True True True True True True  Size 95774 95714 96771 305367 95978 76500 244880 92812 98437 98692 103022 257481 60949 97701 91131 89765 55754 55358 72624 66400 70310 60661 68836 58207 56011 1213177  * Gap displays whether the fosmid associated aligned next to a Gapped region, when tiled, or whether the sequence surrounding was a true contig.  104  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071587/manifest

Comment

Related Items