Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genomic mapping, functional analysis and bioinformatics of the Werner syndrome locus (WRN) Bruskiewich, Richard Michael Maurice 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1999-388603.pdf [ 26.96MB ]
JSON: 831-1.0089217.json
JSON-LD: 831-1.0089217-ld.json
RDF/XML (Pretty): 831-1.0089217-rdf.xml
RDF/JSON: 831-1.0089217-rdf.json
Turtle: 831-1.0089217-turtle.txt
N-Triples: 831-1.0089217-rdf-ntriples.txt
Original Record: 831-1.0089217-source.json
Full Text

Full Text

GENOMIC MAPPING, FUNCTIONAL ANALYSIS AND BIOINFORMATICS OF THE WERNER SYNDROME LOCUS (WRN) by RICHARD MICHAEL MAURICE BRUSKIEWICH B.Sc, University of British Columbia, 1992 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF MEDICAL GENETICS MEDICAL GENETICS GRADUATE PROGRAM We accept this thesis as conforming to the required standard: THE UNIVERSITY OF BRITISH COLUMBIA October 1998 © Richard Michael Maurice Bruskiewich, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of M/%)OrVt fr^W^TlCS The University of British Columbia Vancouver, Canada Date DE-6 (2/88) ABSTRACT Werner syndrome is a human genetic disorder exhibiting a phenotype of premature senility with adolescent age of onset. Genetic analysis supported the hypothesis that Werner syndrome is a lesion in a single locus, WRN, assigned to the short arm of human chromosome 8 by genetic linkage to a polymorphic marker, D8S87 (Goto et al, 1992). The primary goal of this thesis was the structural and functional genomic characterization of WRN. The thesis consists of two parts. The first part undertook fine scale refinement of the genetic and physical map of the candidate genome interval from markers D8S87 to D8S137. The second part applied bioinformatics to the functional analysis of WRN through comparative sequence analysis of the DExH DNA helicase gene family. The bioinformatics work exploited software enhancements of the "A Caenorhabditis elegans Database" (ACEDB) genomic database management program that I developed during the thesis project period. The results of the genomic analysis phase of this thesis included the development of three polymorphic genetic markers, including one for LHRH - an important gene involved in human reproduction. One marker, D8S2297, was placed into the WRN candidate region by genetic and long range physical map construction spanning D8S339, a marker with reported linkage to WRN. The positions of three genes - GTF2E2, GSR and PPP2CB - were refined upon this map. The bioinformatics phase of the thesis provided molecular phylogenetic analysis to demonstrate orthology relationships of WRN homologous loci. Multiple sequence alignments of orthologous loci provides a good means of elucidating critical amino acid residue motifs with structural or function roles specific to these genes. ii TABLE OF CONTENTS ABSTRACT ..ii TABLE OF CONTENTS iii LIST OF TABLES x LIST OF FIGURES : xi PREFACE xiii ACKNOWLEDGMENTS xv DEDICATION xvi I. INTRODUCTION 1 A. The Biology of Aging 1 1. Werner syndrome 3 2. Status of Genomic Maps Spanning WRN at the Start of the Thesis Project 5 B. Thesis Overview 6 1. Thesis Goal and Objectives 6 2. Genomic Map Refinement 6 a) Positional Cloning 6 b) Fine Scale Mapping 7 c) Genetic Mapping 8 d) Physical Mapping 10 e) Cell and Radiation Hybrid Mapping 10 f) Subclone-based Mapping 11 g) Transcript Mapping 12 h) Mutation Detection in Candidate Genes 13 3. WRN: A DNA Helicase? 13 4. The Application of Bioinformatics to Genome Data 14 a) The Impact of Genome Sequencing 14 b) The Role of Bioinformatics in Analyzing and Managing Genome Data 15 b) Bioinformatic Analysis of WRN 17 II. MATERIALS AND METHODS 18 A. General Materials and Methods 18 1. Project Sources of DNA 18 2. DNA Preparations 18 a) Growth of Bacterial Based Clones 18 i) Luria Bertani (LB) Media 18 iii Table of Contents b) Growth of YAC Based Clones 19 i) AHC Media 19 ii) YPD Media 19 c) Cosmid/Plasmid Mini-Preps 20 i) Solution 1 21 ii) Solution II 21 iii) Solution III 21 iv) lxTE8 21 d) Cosmid Maxi-Preps 21 i) STE 23 e) Radiation Hybrid DNA Preps .....23 f) YAC DNA Preps 23 f) YAC DNA in Agarose Plugs 24 g) DNA Pools 25 3. DNA Restriction Enzyme Digestion 26 a) Restriction Enzyme Digestions 26 i) Stop buffer (1 Ox) 26 b) Restriction Enzyme Digestion of DNA in Agarose Plugs 26 4. Agarose Electrophoresis Gels , 27 a) DNA Size Standards 27 b) Ordinary Gel Electrophoresis 27 c) Isolation of DNA using DEAE Paper 28 d) DNA Fragment Purification 29 e) Pulse Field Gel Electrophoresis 29 i) PFGE TBE Buffer (lOx stock) 30 5. Radioactive Detection of DNA 30 a) Nick Translation Probe Labeling 30 i) Solution A 31 ii) Solution C 31 iii) Solution D 31 b) Random Oligo Probe Labeling 31 i) OLB-A 32 ii) Solution A 32 iii) Solution B 32 iv Table of Contents iv) Solution C 32 v) NTSB 33 c) Primer (End) Probe Labeling 33 d) Colony Lysis/DNA Binding Protocol 33 i) 20 x SSC 34 e) Southern Blotting 35 f) Hybridizations 36 i) Hybridization Solution 38 ii) Denhardt's Solution 38 g) Repeat Hybridizations of Blots 38 6. Subcloning 38 a) Ligations 38 b) Transformations 39 i) XIA Plates 39 7. Polymerase Chain Reactions (PCR) 39 a) Standard Non-Labeled Reactions 39 b) Long Range PCR 40 c) Primer Labeled Reactions 40 i) PAGE Stop Mix 40 8. Sequencing 41 a) Template Preparation 41 b) Pool Preparation 41 c) Sequencing Reaction 42 d) Polyacrylamide Gel Electrophoresis (PAGE) 43 B. Genomic Map Refinement 45 1. DNA Sources . 45 a) Chromosome 8 Cosmid Library 45 b) Chromosome 8 Cell Hybrid Panels 46 c) Chromosome 8 YAC Libraries 46 d) CEPH Reference Family DNA 47 2. Coarse Genome Mapping Strategy 47 i) Candidate Gene Mapping 47 ii) Somatic and Radiation Cell Hybrid Mapping 49 3. Genetic Linkage Mapping 50 a) STRP Marker Development 50 V Table of Contents b) Genotyping of Polymorphic Markers into CEPH Families 51 c) Linkage Mapping 52 4. Physical Mapping 52 a) STS Content Screening 52 b) Long Range Restriction Analysis 52 c) Transcript Mapping 53 d) Hybridization Probes 53 C. Bioinformatics Applied to the WRN/DExH Gene Family 53 1. Hardware 53 2. Software 54 a) ACEDB for Windows 54 b) ACEDB for Gene Family Analysis 54 c) Web Site Construction 54 d) Specific Genomic Databases and Analysis Software 55 3. Data Analysis 55 a) WRN Bioinformatics Analysis 55 III. RESULTS 58 A. Genomic Map Refinement in the WRN Candidate Region 58 1. Genetic Linkage Map Construction '. 58 a) Linkage Mapping of LHRH 58 b) Somatic and Radiation Cell Hybrid Mapping 66 c) Isolation of STRPs from "CI" Sublibrary Cosmids 66 d) STRPs Mapping in the Vicinity of the WRN Candidate Region 73 i) STRP from Cosmid 53C3 73 ii) STRP from Cosmid 128B9 (D8S2297) 75 e) Integrated Genetic Linkage Analysis of Thesis STRPs 75 2. Physical Map 79 a) Markers Selected for Physical Map Construction in the WRN Candidate Region 79 b) STS Content Map 81 c) PFGE of Whole YACs from the WRN Candidate Region 81 d) Selection of YACs for Long Range Analysis 86 e) Long Range Restriction Map of the WRN Region 86 f) Localization Hypothesis: PPP3CC? 89 vi Table of Contents g) Map Data for Deleted YAC 936g4 90 h) Physical Mapping Summary 91 B. Bioinformatic Analysis of the WRN Gene 101 1. Tool Building 101 a) ACEDB for Windows 101 b) ACEDB Dendrogram Display 101 c) ClustalW 104 d) Perl Scripts 104 2. Analysis of the WRN/DExH Gene Family 105 a) Compilation of WRN Homologous Helicases 105 b) Comparison of Alternate Molecular Phylogenies for WRN Homologous Genes 132 c) Identification of Conserved Residues by Comparison of Orthologous Loci 132 3. HelicAceDb 150 4. HelicaseWeb. 150 IV. DISCUSSION 151 A. Fine Scale Genomic Map Construction Near WRN 151 1. Evolution of WRN Candidate Region Genomic Maps..., 151 2. Linkage Mapping of LHRH..... 152 3. A Chromosome Wide Strategy for Simple Tandem Repeat Polymorphism (STRP) Isolation 153 4. A Multi-point Genetic Linkage Map 154 5. A Long Range Physical Map 154 6. Candidate Genes in the Region 155 B. Positional Cloning of WRN 156 1. WRN, A Putative Helicase 156 2. Human Helicase Functions 157 D. Bioinformatics Analysis of the DExH Gene Family 161 1. Gene Family Assignment and Functional Annotation 161 2". Approaches to Assessing Gene Family Membership 162 3. Dynamic Modeling of Gene Family Membership 164 4. ClustalW 164 5. Phylogenomics 165 6. Visualization of Molecular Phylogenies 166 7. Sequence Analyses of the WRN/DExH Gene Family 167 vii Table of Contents a) WRN is a DEAH Helicase 167 b) Identification of WRN Orthologous Loci in Model Organisms 167 c) Identification of Critical Residues 169 d) Assessment of WRN/F18C5.2 Ortholog Pair 170 e) Assessment of WRN Vertebrate Orthologs 171 E. HelicaseWeb 173 F. HelicAceDb ...173 G. ACEDB for Windows '95/NT 4.0 174 1. Acedb to Windows Port Benefits 175 2. Implementation 176 H. The Current Status of Werner syndrome Research 178 I. Future Directions 180 1. Thesis Work 180 2. Characterization of Gene Families 181 2. ACEDB 182 3. ACEDB for Windows 183 4. The Future of Bioinformatics 184 ABBREVIATIONS 186 BIBLIOGRAPHY 187 APPENDICES 201 A. Published Wagner/Sapru Hybrid Panel Map 201 B. CEPH Reference Family Pedigrees 202 C. CEPH Reference Family Genotypes for STRPs Characterized in this Thesis 206 1. LHRH 206 2. cos53C3PA 209 2. D8S2297 ....211 D. Functional Analysis Work Undertaken in Caenorhabditis elegans 212 Introduction: Functional Analysis in a Model System 212 a) Features of the Caenorhabditis elegans Experimental System 212 b) The Isolation of a Phenotype from a Genotype 214 Materials & Methods 215 a) Maintenance of Worm Stocks 215 i )NGM Media 216 viii Table of Contents ii) M9 Buffer 216 iii) Freezing Medium 216 b) WRN Candidate Homologous Loci in C. elegans 216 c) F18C5.2 and Associated Mutant Strains .217 d) Transgenic Strains 217 e) Longevity Study 218 f) Crosses for Phenotype Rescues by Transgenic Arrays 218 Results221 1. WRN Homologous Loci in Caenorhabditis elegans 221 a) T04A11.6 228 b) K02F3.1 228 c) E03A3.2 229 d) F18C5.2 229 2. A Cosmid System Containing a WRN Homologous Locus 229 3. Is F18C5.2 a Trans-spliced Operon? 230 4. Mutations Mapped in the Vicinity of F18C5.2 234 5. Construction of Transgenic Worms Carrying F18C5.2 234 6. Longevity of Transgenic Worms? 238 7. Phenotype Rescue by the F18C5 Transgenic Array? 238 Discussion 239 E. Computer Program Listings (Perl scripts et al.) 242 F. ClustalW (1.7) Multiple Sequence Alignment of WRN-related Helicases ....248 G. (HTML) Help Document for the New ACEDB Dendrogram Graphic Display 274 Dendrogram Tree Display 274 Contents 275 Tree Data Input 275 Navigating through the Tree 279 Display Information and Configuration 282 Menu Buttons 285 Pop-Up Menu items 286 Associated ACEDB Models 287 ix LIST OF TABLES Table 1. Sequencing Reaction Pool Reagents (quantities in microlitres) 42 Table 2. WWW Resources Employed in Bioinformatics Analysis 56 Table 3. Locally Installed Software for Analysis 56 Table 4. LHRH STRP Alleles Observed in All CEPH Reference Families 64 Table 5. CRIMAP Linkage Data for LHRH STRP 66 Table 6. Hybrid Inter-Alu Product Hybridization Screen Against LA08NC01 68 Table 7. List of (GT)n Positive "CI" Sub-interval Cosmids 70 Table 8. PCR Reaction Conditions for "CI" Region Cosmid STRPs 72 Table 9. cos53C3PA STRP Alleles Observed in CEPH Reference Families 73 Table 10. CRIMAP Linkage data for cos53C3PA 74 Table 11. D8S2297 STRP Alleles Observed in CEPH Reference Families 75 Table 12. CRIMAP Linkage Data for D8S2297 76 Table 13. CRIMAP (Sex Equal) Multipoint Linkage Analysis 77 Table 14. Integrated Genetic Linkage Map for Thesis STRPs 78 Table 15. STS Systems (other than D8S2297) Employed for Physical Map Construction.. 85 Table 16. STS or Gene Loci and Associated Hybridization Probes 87 Table 17. Southern blots of Asc I and Not I Restriction Digests of WRN region YACs 94 Table 18. List of WRN/DExx Family Genes Used in the Analysis 106 Table 19. ClustalW (Default) Pairwise Alignment Parameters ,110 Table 20. ClustalW (Default) Multiple Alignment Parameters 110 Table 21. ClustalW (Default) Protein Gap Parameters 110 Table 22. Genetic Diseases caused by Mutations in Helicases 160 Table 23. C. elegans Cosmid Loci Exhibiting High Sequence Similarity to WRN 222 Table 24. WRN BLAST Search Results Against C. elegans WormPep Database 222 Table 25. WRN BLAST Search Results Against C. elegans DNA Sequence Database.... 222 Table 26. Caenorhabditis elegans mutant loci on genetic map spanning Fl8C5.2 234 Table 27. Longitudinal Study of Fecundity and Transgenic Array Transmission 237 Table 28. Longevity Experiment Time Course 239 x LIST OF FIGURES Figure 1. Carl Wilhelm Otto Werner (1879 - 1936) : 2 Figure 2. Werner Syndrome Patient 2 Figure 3. Circa 1994 Cytogenetic Map of the Short Arm of Human Chromosome 8 48 Figure 4. Confirmed LHRH Exon #2 STS Localization to 8p 60 Figure 5. Primary STS Screening of LA08NC01 Cosmid DNA Pools for 37F12 60 Figure 6. Secondary Screen for 37F12 in LA08NC01 Cosmid DNA 61 Figure 7. LHRH Exon#2 Hybridization against LA08NC01 Colony Blot 61 Figure 8. £coic7Digest of LHRH Exon#2 Positive Cosmid 145G1 (3 samples) 62 Figure 9. Digests of (GT)n Positive 0.6 kb PstI Fragment of 145G1 63 Figure 10. Portion of a Sequence Gel for the LHRH STRP 63 Figure 11. Representative Autoradiograph of Genotyping Gel for LHRH STRP 64 Figure 12. Somatic and Radiation Cell Panel Map 68 Figure 13. EcoRI Digests of "C1" Sub-Interval Cosmids 69 Figure 14. Southern of EcoRI Restricted "CI" Cosmids Probed with Oligo (GT)n 71 Figure 15. Restriction Digests of Cosmid Isolates for WI-7626 82 Figure 16. Inter-Alu PCR Verification of Human DNA Content in mega-YAC DNA Preps 85 Figure 17. STS Content of YACs in the Vicinity of D8S339 86 Figure 18. Whole mega-YAC PFGE 86 Figure 19. Southern Blot of Whole YAC PFGE Gel with Total Human Genomic Probe 86 Figure 20. PCR with PPP2CB 3' STS (WI-7626) in Chromosome 8 Cosmids 89 Figure 21. PFGE of Ascl and NotI uncut, single and double digested YACs 92 Figure 22. Southern Hybridization of PFGE of Rare-Cutter Restricted YACs 94 Figure 23. Southern of PFGE of Restricted YACs Probed with GTF2E2 & D8S540 95 Figure 24. Southern of PFGE of Restricted YACs Probed with PPP2CB. 96 Figure 25. Southern of PFGE of Restricted YACs Probed with D8S1055 & WI-7626 97 Figure 26. Southern hybridizations of PFGE of Restricted YAC 896f4 98 Figure 27. Integrated Genomic Map Spanning D8S339 and WRN 100 Figure 28. ACEDB "Dotter" Self Plot ofthe WRN Protein 102 Figure 29. Peptide Display with Hydrophobic Plot of the WRN Repeat Subsequence 103 Figure 30 WRN/Dex[DH] Protein Phylogenetic Trees Derived from ClustalW 111 Figure 31. WRN Protein Phylogenetic Subtree Generated By Bete Analysis 116 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences 117 Figure 33. BLAST2 Pairwise Comparison ofthe BLM and T04A11.6 Orthologs 135 xi List of Figures Figure 34. CLUSTALW Pairwise Comparison of the BLM and T04A11.6 Orthologs .136 Figure 35. BLAST2 Pairwise Comparison of the RECQL and K02F3.1 Orthologs 138 Figure 36. CLUSTALW Pairwise Comparison of the RECQL and K02F3.1 Orthologs.... 139 Figure 37. BLAST2 Pairwise Comparison of the WRN and F18C5.2 Orthologs 141 Figure 38. CLUSTALW Pairwise Comparison of the WRN and Fl 8C5.2 Orthologs 142 Figure 39. CLUSTALW Pairwise Comparison of the WRN and E03 A3.2 Genes 144 Figure 40. CLUSTALW of Vertebrate WRN Orthologs 147 Figure 41. Helicase Motifs in the RecQ Gene Family 159 Figure 42. Schematic of Transgenic Nematode Crosses 220 Figure 43. MSA Spanning Helicase Domain VI of WRN Homologous Genes 223 Figure 44. ClustalW Tree from Preliminary Domain VI MSA 223 Figure 45. ACEDB Gene Feature Map of F18C5.2 224 Figure 46. ACEDB Gene Feature Map of T04A11.6 225 Figure 47. ACEDB Gene Feature Map of K02F3.1 226 Figure 48. ACEDB Gene Feature Map of E03A3.2 227 Figure 49. PstI Restriction Gel of Fl 8C5 Cosmid DNA 231 Figure 50. 5' and 3' Locus Specific PCR of Fl 8C5 6.4 kb PstI Subclone 231 Figure 51. TESS Predicted Promoter in Cosmid C56E6 5' Region Flanking F18C5.3 234 Figure 52. ACEDB Physical Map for Fl8C5 236 Figure 53. ACEDB Inferred Genetic Map Interval for Fl8C5.2 236 xii P R E F A C E Portions of this doctoral thesis incorporate material previously published in one of two refereed papers, citations as follows: Bruskiewich R, Everson T, Ma L, Chan L, Schertzer M, Giacobino JP, Muzzin P and Wood S. (1996) Analysis of CA repeat polymorphisms places three gene loci on the 8p linkage map. Cytogenet. Cell Genet. 73:331-333 Bruskiewich R, Schertzer M and Wood S. (1997) A Long Range Physical Map Spanning the Werner syndrome Region. Genome 40:77-83 In the case of the former paper about CA repeat polymorphisms, this doctoral Candidate was fully responsible for a third of the scientific data presented, specifically data pertaining to the LHRH locus STRP marker. In the case of the latter paper on the Werner syndrome region, this Candidate was responsible for approximately 80% of the scientific data presented in the paper, including the development of the D8S2297 STRP marker, portions of the STS content map, the long range restriction YAC plug digests, PFGE, hybridization probe development and Southern hybridizations. Statement Certified by Senior Author: . . ^ Dr. Stephen Wood Medical Research Council of Canada grants covering the research documented in this thesis are as follows: Genetic Linkage mapping on 8p: MRC/CGAT Grant No. GO-12753 Werner syndrome related work: MRC Grant No. 591210 Windows ACEDB Development: MRC/CGAT Grant No. GO-13282 In addition to pure research, this thesis outlines development work undertaken by this Candidate upon ACEDB and elements of other public domain scientific software. This Candidate's contribution to the development of this software is as specifically noted in this thesis. All copyrights and associated legal rights to this software are owned by authors of the software, as noted in this thesis, and not the University of British Columbia. The specific copyright notice for ACEDB is reproduced for reference on the next page. xiii ACEDB Copyright (C) 1990-1998 - R Durbin and J Thierry-Mieg. All rights reserved; primary code written by Jean Thierry Mieg and Richard Durbin, 1990-1998 WTN32 platform code written and Copyright (C) 1995-98 by R. Bruskiewich. 1995-96 WIN32 software development activities funded by the Canadian Genome Analysis and Technology Program (CGAT) of the Medical Research Council (MRC) of Canada. Disclaimer: Redistribution and use in source and binary forms are freely permitted provided that the above copyright notice and attribution and date of work and this paragraph are duplicated in all such forms and that neither this software nor software based in whole or in part on this software is sold for profit. THIS SOFTWARE AND DOCUMENTATION IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. xiv ACKNOWLEDGMENTS I thank Dr. Stephen Wood and Dr. Ann Rose for the use of their laboratory facilities to undertake this project. I thank my supervisory committee, Dr. Hugh Brock, Dr. Muriel Harris and Dr. R. Keith Humphries, for their guidance during the course of this project. Gratitude is expressed to Dr. Nathan Goodman of the Jackson Lab, who volunteered as a extramural addition to my committee, permitting me to do bioinformatics in my thesis. Special thanks is also extended to another faculty member, Dr. Diana Juriloff, who might just as well been a formal member of my committee given the extent of her ongoing patient guidance and timely pep talks throughout my graduate journey. I also thank Dr. Jean Thierry-Mieg of CNRS-Montpellier and Dr. Richard Durbin of the Sanger Centre, UK, for the opportunity to collaborate on ACEDB development. Special thanks to Mike Schertzer of the Wood lab for invaluable collaboration with the genomic mapping, in particular, with the early cosmid and radiation hybrid experiments. I also thank the many other post-doctoral, technical and student colleagues in the Wood, Rose and Dr. David Baillie (SFU) lab who joined in the adventure. A thank you is also extended to Dr. Sjolander for the Bete sequence analysis. Finally, thanks go out to other faculty and staff of the Department of Medical Genetics, and all other Departments of the University, who contributed assistance, useful suggestions, encouragement and/or a sympathetic ear during my program. Earlier portions of this work were undertaken with funding from the Canadian Genome Analysis and Technology program of the Medical Research Council of Canada, alongside the generous ongoing support from the University of British Columbia in the form of a University Graduate Fellowship. XV DEDICATION To my beloved, understanding wife, Florence, who has, with difficult but enduring love, supported me materially and spiritually throughout my graduate studies. To our beloved children born during my doctoral program, Kenneth Casimir and Marie-Rose Rita, who alternatively served as comic reliefs and a distraction during the long trek towards thesis completion. To my Parents, who gave me the life and the opportunities for education leading up to this exploration of science. Finally, I quietly thank the spirit from my past first inspiring me down the path towards this endeavor. xvi I. INTRODUCTION A. The Biology of Aging To varying degrees, genes, environment and developmental stochastic events all contribute to the generation of phenotypes in living organisms (Suzuki et al. 1989). As a biological process, senescence is similarly driven by an interaction of genetic, environmental and developmental events. The specific role of heredity in senescence is clear from the fact that different species have large variations in their maximum lifespan. In the animal kingdom, for example, this can range from one day for the mayfly to 160 years for the Galapagos tortoise (Finch, 1991). The task of elucidating the genetic basis for aging, is however, not an easy one. It is primarily through experimentation with simpler model organisms, such as Drosophila, Caenorhabditis elegans, and Saccharomyces cerevisiae, that many of the recent advances in the biological study of senescence have been made possible (Rose and Archer, 1996). However, the variability and complexity of phenotype and genotype relating to senility in higher vertebrates present a serious challenge to a similar understanding of aging in mammals. Human monogenic syndromes that express some of the characteristics of aging have thus intrigued researchers searching to understand aging in higher animals. Most prominent among these are Werner syndrome and Hutchinson-Gilford syndrome (progeria of childhood), generally judged to be realistic models of accelerated senescence by phenotypic criteria (Brown, 1990). One of these, Werner syndrome, is the focus of attention of this thesis. 1 Figure 1. Carl Wilhelm Otto Werner (1879 -1936) Photo taken in Kiel, Germany, 1904 (from Werner syndrome & Human Aging Salk, Fujiwara and Martin, Eds. 1985) Werner Syndrome Patient Classical "Before and Afterwards" Photographs of a Werner Syndrome Patient (as a teenager and at 48 years of age; from Epstein et al, 1966) 1. Werner syndrome In a doctoral thesis in 1904, Otto Werner first described a family with 4 siblings exhibiting the following symptoms: • Shortness of stature • Overall senescent appearance • Graying and loss of hair from age 20 • Cataracts appearing during the third decade • Pathological skin changes including ulcers • Muscular and adipose tissue atrophy of the extremities • Premature cessation of menses Although there were no indications of consanguinity in the family pedigree, no obvious environmental influence appeared to generally cause the disease since a fifth sibling in the family and the parents showed no evidence of the above symptoms. Dr. Werner therefore postulated the genetic origin of the patients' condition. Werner syndrome (WRN; MIM# 27770) is a rare human autosomal recessive genetic disorder with an adolescent age-of-onset and a pleiotropic progeroid phenotype. (Epstein et al, 1966, Yu et al, 1996). Probands are often brought to attention because of the lack of the pubertal growth spurt, with consequent short stature and hypogonadism. Patients typically exhibit canities and alopecia, atrophic and sclerodermatous skin, peripheral muscular atrophy, osteoporosis, bilateral juvenile cataracts, atherosclerosis, loss of subcutaneous fat, and a high prevalence of diabetes mellitus. There is an increased susceptibility to neoplasms, and a common cause of death is from complications of atherosclerosis in the fourth decade. Certain aspects of the Werner syndrome phenotype, however, do not resemble normal aging, such as: hyaluronic aciduria (Kieras et al, 1986), soft tissue calcification, ulceration around ankles, disproportionately severe osteoporosis of the limbs, and vocal cord changes. 3 Werner syndrome patients display a very high incidence of sarcomas and meningiomas, while epithelial tumors are most prevalent in the general population. However, other common late age-of-onset disorders such as Alzheimer's disease and hypertension are not seen at any higher frequency in Werner syndrome patients than in the general population (Epstein CJ et al, 1966; Yu etal, 1996). At the cellular level, Werner syndrome appears to be a member of the general class of human genetic disorders characterized by genomic instability and/or DNA repair deficiencies. These diseases include xeroderma pigmentosum, ataxia telangiectasia, Cockayne syndrome, Fanconi anemia and Bloom syndrome. Studies of fibroblasts and peripheral lymphocytes from Werner syndrome patients reveal genomic instability and reduced proliferative capacity of cells. Cytogenetic analyses have shown a characteristic abnormality termed "variegated translocation mosaicism", consisting of multiple, stable, and clonal chromosomal rearrangements including inversions, translocations, and deletions (Salk et al, 1981, 1985). Extensive deletions at the HPRT locus in SV40-transformed Werner syndrome cell lines were observed (Fukuchi et al, 1989; Monnat et al, 1992), presumably related to the high frequency of spontaneous 6-thioguanine resistance observed in peripheral lymphocytes (Fukuchi et al, 1990) and Werner syndrome cell lines (Fukuchi et al, 1985). Other evidence of genomic instability in Werner syndrome includes hypermutable ligation of disrupted plasmids (Runger et al, 1994); elevated non-homologous recombination at the HPRT locus (Monnat et al, 1992); decreased repair of telomeres after UV irradiation (Kruk et al, 1995); and accelerated loss of telomeric repeats (Kruk et al, 1995; Schulz et al, 1996). In contrast to Bloom syndrome, however, Werner syndrome cells do not undergo increased sister chromatid exchange (Gebhart et al, 1988; Melaragno et al, 1995). Also, unlike other DNA repair disorders (e.g. xeroderma pigmentosum, Cockayne syndrome, Fanconi anemia, etc.), Werner syndrome cells do not exhibit 4 hypersensitivity to radiation or chemical mutagenesis (Fujiwara et al, 1977; Gebhart et al, 1985; Nikaido et al, 1985; Stefanini et al, 1989; Webb et al, 1996). Werner syndrome fibroblasts are also characterized by accelerated senescence in vitro, achieving only about 20 population doublings in culture (Holliday et al, 1985; Salk et al, 1985; Farragher et al, 1993), unlike normal diploid fibroblasts that typically achieve as many as 60 doublings, the so-called Hayflick limit (Hayflick, 1981; Cristofalo et al, 1993). This reduction in proliferation has been attributed to a higher rate of exit from the cell cycle rather than a smaller fraction of cycling cells in a fresh explant (Faragher et al, 1993). Other related evidence has included a decrease of cells in S phase (Tanaka et al, 1979), a slow rate of DNA replication (Fujiwara et al, 1977), and increased distance between initiation sites along DNA, but normal rate of chain elongation (Takeuchi et al, 1982). 2. Status of Genomic Maps Spanning WRN at the Start of the Thesis Project The familial inheritance pattern for this rare disorder exhibits a simple autosomal recessive pattern with a high incidence stemming from consanguineous matings in isolated (Japanese) populations (Goto et al, 1992). These observations supported the notion that Werner syndrome is a genetic disorder caused by mutations at a singular gene loci, WRN. From such observations, WRN mutant alleles were estimated to have a worldwide frequency, "q " of 0.001 -0.0047, giving homozygote incidence, q , of 1 - 22 cases per million (Epstein et al, 1966). Based upon initial homozygosity mapping studies of affected probands, WRN was first assigned to the short arm of human chromosome 8 by detection of linkage between the locus and D8S87 (Goto et al, 1992; Schellenberg et al, 1992). A marker, D8S339 (WT-251), was subsequently reported exhibiting linkage to WRN with a Z m a x = 16.5 at 0.006 recombination fraction (Thomas et al, 1993). Immediately subsequent to the initiation of this thesis project, additional genetic (homozygosity) mapping (Nakura et al, 1994) and physical mapping (Oshima 5 et al, 1994) localized D8S339 within the map interval flanked by the genetic markers D8S87 and D8S137. Linkage disequilibrium and haplotype studies of known chromosome 8p markers near this region, including the GSR locus, provided additional refinements of the mapping data within the WRN candidate interval (Yu et al, 1994). B. Thesis Overview 1. Thesis Goal and Objectives The overall goal of this thesis project was to elucidate the location, identity and function of the Werner syndrome locus, WRN. The first research objective was to construct a fine scale genomic map of the candidate region flanked by D8S87 and D8S137, to assist the positional cloning of the locus. Subsequent to the identification of WRN as a member of the RecQ DExH family of DNA helicases (Yu et al, 1996), the new research objective became the functional analysis of the WRN gene. Two complementary approaches in functional analysis were eventually pursued. First, a study was undertaken of sequence homologous loci in the model metazoan system, the nematode Caenorhabditis elegans. Second, comparative genomic analysis of the WRN/DExH helicase gene family was undertaken using computer-based analysis ("bioinformatics"). This latter analysis exploits the functionality of the Microsoft Windows port from Unix of the "A Caenorhabditis elegans Database" (ACEDB) genomic software design and implemented by this writer during his PhD candidacy period. 2. Genomic Map Refinement a) Positional Cloning In the case of genetic diseases exhibiting a complex pleiotropic phenotype of unclear biochemical etiology, the best and often only strategy of gene isolation available is positional cloning, the identification of a gene based upon its location on genomic maps (Wicking and 6 Williamson, 1991). Since WRN is a disease of this type, positional cloning was the adopted approach for locus isolation by the Werner research community. Thus, this research project was initially directed towards the refinement of genetic and physical maps within the candidate 8p interval hypothesized to contain WRN. b) Fine Scale Mapping A genome is the entire complement of hereditary material specific to a given living organism. Except for some viruses, genomes are now known to be encoded by identical copies of DNA molecular sequences found replicated in each cell of an organism. A map is a representation of a real entity in symbolic form, representing order and distance between geometric reference points or "landmarks" in the real entity. In the case of genome maps for higher eukaryote organisms, the entity is a geometrically one-dimensional, variable sequence of DNA bases in the genome, partitioned into chromosome-level segments. The reference points of these maps may be any one of a number of features of this DNA sequence, which may be distinguished from one another by gross molecular or fine sequence structure, such that they may be assigned to locations ("loci") in the map, ordered relative to one another. A sequence feature at a given locus may come in more than one form or "allele", permitting its use as a "marker" of the genome location, to reveal alternate versions (including losses) of the genetic material (DNA) within a given chromosome, cell, organism or generation of organism. There are essentially two categories of genome maps, differing in their fundamental methodology for ordering loci and in their metric for measuring interlocus distances: genetic and physical. A genetic map is based upon statistical estimates of marker recombination frequency observations to measure interlocus distances, and multi-point or haplotype analysis to determine locus orders. Physical maps measure real physical distances in terms of nucleotide base pairs 7 between loci, with additional statistical and experimental analysis of (possibly overlapping) genome fragments, to determine locus order. The ultimate physical map is the complete DNA sequence of nucleotides of a genome. A map required to characterize a given interval may be designated as "fine scale" when it proceeds beyond coarse cytogenetic or chromosomal assignment of loci and spans the transition in map resolution between genetic linkage maps (± 1 centiMorgan) and the largest DNA fragments physically that can be cloned using current technology ( ± 1 megabase in YACs). At this resolution, genetic and physical map construction are complementary and proceed concurrently. c) Genetic Mapping A useful high resolution genetic map requires the compilation of enough genetic markers of adequate polymorphic information content (PIC) to ensure a high probability of informative matings (Botstein et al, 1980) to use in constructing a genetic linkage map using statistical approaches (Ott, 1985). Simple tandem repeat sequences (STRs), such as (CA)n dinucleotide repeats, are abundant within the human genome and are easily detected by oligonucleotide hybridization. Unique pairs of sequences flanking such STRs may serve as polymerase chain reaction (PCR) primers to detect Simple Tandem Repeat Polymorphisms (STRPs) with high PIC (Weber and May, 1989). As such, STRPs are currently the genetic markers of choice in many on-going linkage mapping projects, including this thesis project. By applying such STRP systems by PCR to analyze the presence or the absence of given alleles of the markers in the DNA of a given individual, in comparison to other related individuals (parents, grandparents, siblings, children), exchanges of genetic material by recombination may be directly or indirectly inferred. 8 In order to obtain such meaningful comparisons between genetic markers, it is necessary to construct reference genetic maps based upon universal mapping reagents. For this purpose, in the human population, a common panel of eight multigenerational families, the CEPH families, was adopted by the human genomic research community for reference map construction (Dausset et al., 1990; Weissenbach et al., 1992; Gyapay et al., 1994). Observations of the segregation of two or more markers across such multi-generation human families may be collected to analyze for potential chromosome recombination events. Two-point or multi-point analysis is undertaken using computer software to calculate "likelihood of odds" ("LOD") scores, the log 10 ratio of the likelihood of a given set of marker segregation observations given linkage (at a specified genetic distance "theta") versus the likelihood of the observations, if the two loci are genetically unlinked (Lathrop et al, 1984, 1985; Ott, 1985). In the current thesis project, some limited genetic mapping was undertaken based upon an early thesis hypothesis concerning the location of the WRN locus. At the very beginning of the thesis project, the human Luteinizing Hormone Releasing Hormone (LHRH) locus was observed to be assigned to the short arm of chromosome 8; however, a precise genetic and physical map location for the gene was unknown. Working on the observation of hypogonadism in many but not all Werner syndrome patients (Epstein et al, 1966), a "contiguous gene deletion" hypothesis was formulated that proposed close physical linkage between WRN and LHRH. It would follow from this hypothesis, if valid, that precisely mapping LHRH would lead to a refinement of the WRN map location. Therefore, I proceeded to isolate a LHRH-associated genetic marker based upon STRP technology. This work is published (Bruskiewich et al, 1996). As an alternative approach to the LHRH limited target genetic mapping strategy, I also endeavored to isolate novel polymorphic genetic markers by screening for STRPs in cosmids assigned to physical mapping intervals defined by a coarse radiation hybrid map (see Physical 9 Mapping - Cell and Radiation Hybrid Mapping below). One such marker, designated D8S2297, was verified by genotyping into the CEPH family panel and subsequent linkage map analysis to reside in the WRN candidate region of interest in this project (Bruskiewich, Schertzer and Wood, 1997). d) Physical Mapping STRPs exhibit another useful feature. Once a fine scale genetic linkage map is constructed with STRP markers, these same PCR-based markers may serve as sequence tagged site (STS) markers on a physical map (Olson et al, 1989). Such a physical map is built by cell hybrid mapping panels (Wagner et al, 1991; Sapru et al, 1994), radiation hybrid mapping (Cox et al, 1990; Kirchgessner et al, 1995) or by ordering overlapping genomic subclones across the region (Schlessinger, 1990). The latter technique also generates genomic DNA fragments for further direct manipulations. e) Cell and Radiation Hybrid Mapping Cell hybrid mapping panels are constructed by the judicious collection of hybrids created by chromosome loss processes subsequent to the fusion of human cells to rodent cells under selective conditions. Often, the source human cells may contain chromosome translocations of the chromosome in question. A coarse map of overlapping chromosome fragments may be constructed by characterizing selected hybrids using Southern hybridization (Southern, 1975) with locus-specific DNA probes or inter-Alu short interspersed repeat element based PCR products (inter-Alu PCR; Nelson et al, 1989; Cole et al 1989). Alternatively the hybrids may be screened by PCR using human-specific sequence tagged site systems (STS). One of the advantages of these hybrids is that markers do not need to be polymorphic, since their presence or absence in hybrids is assessed. 10 In a similar way, radiation hybrids are created by human-rodent cell fusion, except the source human cells are pretreated with a fixed dose of high energy radiation (typically X-rays) such that partial DNA fragmentation results and cell death results. Such human DNA fragments are stochastically segregated, lost or retained across several cell generations in their rodent hosts. Again, human-specific STS content techniques may be employed to characterize the resultant hybrid cells, for the presence or absence of specific fragments of the host DNA. The resulting observations are then analyzed using appropriate mapping formulas. These formulas account for the ambiguities in the retention of two given STS marker sites by physical linkage on a single fragment versus retention of STS markers due to independent concurrent inheritance of the two markers in two separate DNA fragments in a given hybrid (Cox et al. 1990). f) Subclone-based Mapping Chromosome and radiation cell hybrids are useful mapping reagents; however, due to the multiplicity of retained fragments, genomic instability, structural rearrangements and low DNA preparation titres, they are inadequate for the task of sequence characterization of genomic DNA. For such purposes, subcloned DNA fragments are superior with their possibility for selective control over the subcloned DNA fragment and the potential for monoclonal amplification of DNA quantities. Such subcloned DNA is obtained by screening genomic libraries constructed by DNA restriction and insertion into one of a series of available cloning vectors, such as: cosmids (±35 kilobases insert size; Collins and Hohn, 1978), PI bacteriophages (± 100 kilobases insert size; Sternberg, 1990), bacterial artificial chromosomes (BACs; ± 300 kilobases insert size; Shizuya et al, 1992) or yeast artificial chromosomes (YACs; ± 1 megabase insert size; Burke, Carle and Olson, 1987). The libraries are typically arrayed in microtitre plate grids with unique clone addresses. Such plate grids may be replicated onto suitable membranes for hybridization by 11 probe DNA. Alternatively, DNA pooling schemes may be devised to facilitate clone isolation by PCR screening. The physical map is then extended to flanking genomic regions by detecting overlapping subclones from the genomic library using a variety of "fingerprinting" techniques such as: STS content analysis, inter-Alu PCR (Nelson et al, 1989; Cole et al. 1989) or end-clone probe hybridization (Schlessinger, 1990); and long range "rare cutter" restriction site mapping with pulse field gel electrophoresis (PFGE; Schwartz and Cantor, 1984). The resulting subsets of overlapping genomic subclones constitute "contiguous" collections of subclones, or "contigs". A physical map based on these contigs show clone insert sizes, intervals of overlaps and distances between genomic "landmarks" (STSs, long range restriction sites, CpG islands, transcribed sequences). (Nagaraja, 1992). The use of the same markers (STRPs as STSs) in the construction of both linkage and physical maps avails a convenient means of aligning the two types of maps. By typing novel STRP markers within pedigrees segregating the locus of interest (in this case, WRN), the two markers flanking the locus within the linkage map automatically demarcate the physical map region likely to contain the gene. Given sufficient refinement in genetic and physical mapping, such flanking markers will ultimately delimit a fragment of manageable size of genomic DNA for transcript mapping. g) Transcript Mapping The detection and mapping of transcribed sequences representing candidate genes in the defined genomic region may then be undertaken by a variety of techniques, for example: hybridization of subcloned genomic fragments to Northern blots of specific tissues or to whole mRNA, possibly from a variety of species (to detect evolutionary conservation of single copy DNA, so called "zoo blotting") (Wicking and Williamson, 1991); genomic fragment 12 hybridization to cDNA libraries; direct cDNA solution-based hybridization selection (Ito, Smith and Cantor, 1992); biological screening for gene features such as exons (Duyk et al, 1990; Buckler et al, 1991) or for CpG islands (Wicking and Williamson, 1991; the Wood lab AscI screen used in this thesis to identify a cosmid with GTF2E2); or direct sequencing with computational sequence analysis (Hochgeschwender, 1992). In recent years, genome-wide expressed sequence tag (EST) maps have been constructed (Hudson et al, 1995; Schuler et al, 1996) permitting the candidate gene mapping approach used in this thesis (with PPP2CB). h) Mutation Detection in Candidate Genes Transcribed sequences whose candidacy is supported by map evidence may be individually tested in disease pedigrees for structural or sequence deviations from normal control DNA, using a variety of mutational detection techniques: locus-specific probe hybridization to Southern blots to detect deletions, insertions or modified restriction sites (Southern, 1975); detection of single strand conformational polymorphisms (SSCP) indicative of sequence changes (Orita et al, 1989); enzymatic detection of mismatches of mutated versus normal alleles (Myers, Larin and Maniatis, 1985); or direct sequence comparisons. Such deviations, consistently segregating with the disease phenotype across numerous disease pedigrees, would strongly support the identity of a given candidate gene locus with the disease locus. Positional cloning of the Werner syndrome locus, WRN, was reported at the completion of the thesis physical mapping work (Yu et al, 1996). In view of that and the lack of extensive patient DNA resources available, I did not pursue mutation screening for the WRN locus. 3. WRN: A DNA Helicase? DNA and protein BLAST sequence similarity searches (Altschul et al, 1990) in sequence databases reveal that WRN is a candidate member of the Escherichia coli RecQ DExH DNA helicase gene family. The 1432 amino acid protein predicted from the Werner syndrome 13 gene cDNA sequence contains a region with a strong sequence similarity to the helicase motifs ofthe E. coli RecQ-like DNA helicases (Yu et al, 1996). DNA helicases are double helix unwinding enzymes generally involved in processes of DNA replication, transcription and recombination (Gorbalenya and Koonin, 1993). The RecQ-like family of helicases includes Escherichia coli RecQ (Umezu et al, 1990); two other human homologs, the Bloom's syndrome protein, BLM (Ellis et al, 1995; Ellis and German, 1996)) and RECQL (Puranam and Blackshear, 1994); yeast homologs, Saccharomyces cerevisiae, Sgsl (Gangloff et al, 1994; Watt et al, 1995,1996) and Schizosaccharomycespombe, Rqhl (Stewart et al, 1997); and several paralogous loci in Caenorhabditis elegans (this thesis). All of these proteins have seven conserved consensus helicase motifs, including the putative ATP-binding domain in motif I and the DExH sequence (where "x" may be any amino acid) in motif II that is thought to bind Mg2+ and interact with the ATPase motif. The identification of the Werner syndrome protein as a putative helicase leaves open numerous possibilities about its function, such as DNA replication, repair, transcription, recombination, chromosome segregation, or any activity requiring unwinding of the DNA (Yu et al, 1996). RecQ is thought to play a role in a specific recombination pathway in E. coli. (Umezu et al, 1990). Loss of function mutations in the candidate yeast ortholog, Sgsl are observed to interfere with chromosome segregation during mitosis and meiosis. Sgsl also exhibits epistatic interactions with topoisomerases (Gangloff et al, 1994; Watt et al, 1995). 4. The Application of Bioinformatics to Genome Data a) The Impact of Genome Sequencing Expanding sequencing activities worldwide are generating an exploding pool of DNA sequences from a wide range of organisms, in various sequence databases (Science Genome Issues, 25th October 1996; 24th October 1997). The complete genomes of several organisms are 14 now completely characterized (yeast and many prokaryotes) or nearly completely characterized (Caenorhabditis elegans) (Rowen, Mahairas and Hood, 1996). Sequencing of the genomes of several other model organisms is proceeding apace, including the classical genetic models, Drosophila melanogaster and Mas musculus, as well as one plant, Arabidopsis thaliana. The human genome itself is now also the subject of increasing scrutiny. The availability of extensive sequence data for highly related genes suggests that a comprehensive, computational multi-sequence comparative genome analysis and functional annotation of whole gene families may now be feasible. Such analysis would likely yield valuable new insights into the function of individual gene members of the family. In particular, due to this expanding pool of sequence data, a large number of sequences, exhibiting statistically significant sequence similarity to segments of the WRN gene, are now available for study. In addition to strong candidate orthologs and paralogs, the BLAST searches using the WRN protein sequence also reveal a large number of other less significant matches to other DNA and RNA helicases. In their totality, the known orthologs, paralogs and more remotely related BLAST matches to WRN form a rich, multi-species dataset of sequence information of potential utility in elucidating the function of WRN helicase via comparative bioinformatics analyses. b) The Role of Bioinformatics in Analyzing and Managing Genome Data One of the growing challenges inherent in the exponential growth of biological information, in particular sequence data, is the problem of data management and accessibility. Fortunately, powerful new computer technologies and metaphors have evolved over the past 20 years, in particular, the ubiquitous personal computer workstation and the proliferation of Internet based services, in particular, file transfer protocol (FTP), email and the now ubiquitous World Wide Web (WWW). The potential for the application of this computer technology in the 15 analysis and management of biological data is being earnestly exploited (e.g. see the special "Computers in Biology" issue, Science 273(5275), 2 August 1996). This application of computing to biological (in particular, sequence) analysis started almost 30 years ago, with the publication of the algorithm of Needleman and Wunsch (1970), which was followed in short order by the pioneering contributions of Sankoff (1972, 1975); Sellers (1974); Waterman, Smith and Beyer (1976); Staden (1979); and Smith-Waterman (1981). This early work has come to fruition in the form of practical tools for the contemporary biologist. Software programs like FASTA (Pearson and Lipman, 1988), BLAST (Altschul et al. 1990, 1997), ACEDB (Thierry-Mieg and Durbin, 1990-98) and ClustalW (Thompson, Higgins and Gibson, 1994) provide powerful analytical tools for the practising biologist. Innumerable biological databases have sprung up (see "Database Issue" of Nucleic Acids Research Vol. 26(1), January 1, 1998 for numerous contributions in this area of bioinformatics). These now also include extensively cross-indexed search and retrieval databases such as Entrez (McEntyre, 1998;, SRS (Etzold, Ulyanov, and Argos, 1996) and GeneCards (Rebhan and Prilusky; and XrefDb (Bassett et al, 1997). In an attempt to manage the expanding population of bioinformatics tools, integrated WWW indexes of such tools are being developed, for example, Pedro's Tools ( and the Baylor College of Medicine's Biologist's Control Panel ( Unfortunately, functional annotation of genes remains labor intensive even when undertaken by biologists with higher than average computer literacy. Ideally, a biologist with only moderate computer literacy should be able to submit novel gene sequences to some integrated bioinformatics interface and obtain as output a comprehensive answer giving the gene family membership of the sequence and associated functional information, as derived by all 16 available tools and evidence. The answer should also provide a justification of all assumptions and approaches taken to arrive at the answer. Some laudable attempts are being made to address this objective to some extent: GeneQuiz ( or the NCSA Biology Workbench ( b) Bioinformatic Analysis of WRN This thesis project set an objective of facilitating gene family analysis and annotation using and assessing computational tools available on the World Wide Web and by customizing genomic database software (ACEDB) originally ported to Microsoft Windows as an extracurricular activity during this thesis project. In particular, an effort was made to elucidate the one-to-one (or possibly, one-to-many) mapping (in a mathematical functional sense) between gene sequences exhibiting high sequence similarity. Particular attention was placed upon identifying sequences similar by common descent and separated only by speciation (not gene duplication events), that is, truly orthologous genes. This activity, denoted in this thesis as "ortholog mapping", seeks to maximize the quantity and quality inferences achievable with comparative sequence analysis. For example, correct alignment of true gene orthologs could reveal highly conserved sequence motifs that are unique to the orfholog-defined gene sub-family, and indicative of amino acid residues with critical sub-family-specific structure/function. 17 II. MATERIALS AND METHODS A. General Materials and Methods /. Project Sources of DNA This thesis project exploited somatic and radiation hybrid cells as well as YAC, cosmid, and bacterial (plasmid) clones of segregated or subcloned DNA, of human and nematode (Caenorhabditis elegans) origin, available in the UBC laboratories of Drs. Stephen Wood and Ann Rose (see "Genomic Map Refinement Materials - DNA Sources" below). Storage of DNA sources (cells or DNA samples) was at 4°C, -20°C, -80°C or under liquid nitrogen, depending upon the nature of the given sample. Preparation of DNA for restriction, hybridization, PCR or other manipulations was undertaken in a manner consistent with standard methods for each type or DNA source. 2. DNA Preparations a) Growth of Bacterial Based Clones Cosmid or plasmid DNA clones were generally single colony inoculations into LB broth, with or without antibiotic selection, grown overnight in a 37°C shaker incubator, in volumes consistent with their intended use. Mini-preps were generally initiated from cultures of 5 ml of autoclaved LB. i) Luria Bertani (LB) Media Per Litre: Bacto-Tryptone 10 g Bacto-Yeast 5 g NaCl 10 g Dextrose 1 g Agar (solid media) 12 g Dissolve in de-ionized water and adjust to pH 7.0 with 5 N NaOH. make up volume to 1 litre and autoclave for 20 min at 15 lb/ on liquid cycle. For solid agar media, add 12 -15 g of powdered agar per litre of 18 media, prior to autoclaving. For antibiotic containing media, wait until media has cooled to less than 50°C prior to addition of the antibiotic. b) Growth of YAC Based Clones Yeast artificial chromosome (YAC) clones (Burke, Carle and Olson, 1987; Brownstein et al. 1989) were generally grown YPD broth (possibly containing 50 mg/ml ampicillin to inhibit bacterial growth) from single large colonies picked off solid AHC media plates (Brownstein et al, 1989) inoculated from -80°C accessioned yeast 15% glycerol stocks. Broths are incubated with vigorous agitation for at least 2 days at 30°C. i) AHC Media For 1000 ml, Minimal Media, ura-, trp-: 1.7g Yeast Nitrogen Base 5g Ammonium Sulfate 10 g Casein Hydrolysate 20 g Dextrose 20 mg Adenine (pH 5.8) dH20 to 1000 ml; autoclave 15 min., 12l°C, 15 psi. ii) YPD Media For 1000 ml: 10 g Bacto-yeast extract (1 %) 20 g Bacto-peptone (2%) 20 g Glucose (2%) Add 20 g Bacto-agar (2%) for plate solid media. dH20 to 1000 ml; autoclave 15 min., 12l°C, 15 psi. 19 c) CosmiaVPlasmid Mini-Preps DNA was prepared using a modification of the alkaline lysis protocol of Birnboim and Doly (1979) from laboratory prepared materials or from commercial kit sources. The mini-prep protocol is briefly as follows. One or more aliquots of overnight culture were pelleted in 1.5 ml Eppendorf tubes spun in a microcentrifuge and supernatant discarded completely. The resulting pellet was resuspended by gentle pipetting in 100 pi of Solution I and left at room temperature for 5 minutes. 200 pi of solution II (0.2M NaOH/ 1% SDS made fresh) was added and the preparation was mixed gently by inverting the tube several times, then left on ice for 5 minutes, until solution cleared. 150 pi of solution III was added and mixed gently by inverting the tube. After further incubation on ice for 5 minutes, the tube was spun in the microcentrifuge for 5 minutes. The supernatant was then transferred to a fresh tube without disturbing the discarded pellet. Optionally, if strict SDS-free DNA preparations were desired, the resulting supernatant was extracted with an equal volume of distilled, TE8 buffered, colourless phenol then spun for 5 minutes in the microcentrifuge. Without disturbing the organic/aqueous interface, the upper aqueous phase was cleanly transferred to a fresh tube. This phenolic extraction was followed by extraction with an equal volume of Sevag's solution (24 parts chloroform: 1 part isoamyl alcohol), the tube spun for 2 minutes in the microcentrifuge and the aqueous phase collected into a fresh tube each time. 900 pi of 95% ethanol was added to the resulting supernatant, the tube mixed by inversion then left at room temperature for 2-5 minutes. After spinning the tube in the microcentrifuge for 5 minutes, the supernatant was removed thoroughly and the pellet was washed with 150 pi of 70% ethanol, then dried for 5 minutes in a SpeedVac centrifuge. The 20 resulting pellet was dissolved in 25-50 pi of TE (10 mM Tris pH 8, 1 mM EDTA) containing 20-50pg/ml of RNase A. and the sample stored at -20°C. During later (Caenorhabditis elegans functional study) stages of the thesis project, a commercial DNA isolation kit (Qiagen) was also used for DNA isolation. i) Solution I 50 mM glucose 25 mM Tris.Cl (pH 8.0) 10 mM EDTA (pH 8.0) 100 ml batches autoclaved for 15 min @ 10 psi. 4 mg/ml lysozyme (Powder added to solution immediately prior to use. ii) Solution II 0.2 N NaOH (freshly from 10 N stock solution) 1 % SDS (freshly diluted from 20% stock solution) iii) Solution III Per 100 ml aliquot: To 60 ml of (autoclaved) 5 M potassium acetate, add 11.5 ml of glacial acetic acid and 28.5 ml of (autoclaved) dLLO, giving 3M potassium/5M acetate @ pH 4.8. Mix tube contents several times by sharp inversion. Incubate on ice for 5 minutes prior to use. iv) Ix TE8 10mMTris(pH8.0) 1 mM EDTA dFLO to volume Autoclave 20 min @15 psi/230°C d) Cosmid Maxi-Preps Large scale alkaline lysis DNA preparations of Caenorhabditis elegans C50C12 cosmid DNA were undertaken in this project to provide DNA for transgenic worm construction. Solutions I, II and II are as indicated for mini-preps. The general protocol was as follows. 5 ml starter overnight cultures inoculated from a single plate colony were used to inoculate a large 500 ml LB broth containing 50 pg/ml of suitable antibiotic (cosmid vector specific) for shaker 21 incubation at 37°C. 1 ml of 80 mg/ml chloramphenicol was added after 5 hours (OD600 between 0.4 and 0.5) for some preps if this improved DNA yields. After 12 to 16 hours, cultures were collected by centrifugation at 4000 rpm for 10 minutes at 4°C in a Sorval centrifuge, in large 250 ml bottles. The supernatant was discarded and the pellet resuspended in 35 ml of STE. The solution was transferred to smaller centrifuge tubes and re-centrifuged at 4000 rpm for 10 minutes at 4°C in a Sorval centrifuge. The pellet was resuspended in 10 ml of ice cold Solution I, then incubated at room temperature for 5 minutes. 20 ml of freshly prepared Solution II, the contents mixed by rapid inversion several times and the tube stored on ice for 10 minutes. 15 ml of ice cold Solution III was added, tube inverted several times to mix, then stored on ice for 10 minutes. Bacterial debris was removed by centrifugation at 15,000 rpm for 25 minutes at 4°C in a Sorval centrifuge. The supernatant was transferred into 2 centrifuge tubes and 0.6 volumes of room temperature isopropanol were added to each tube. The tubes were mixed gently and placed at room temperature for 15 minutes. The DNA was precipitated by 10 minutes of room temperature centrifugation at 2500 rpm, in a clinical centrifuge. The supernatant was discarded and the pellet air dried. The pellet was dissolved in 10 ml TE (pH 8.0) and 1 g/ml CsCl and 0.8 ml 10 mg/ml ethidium bromide added. The sample was placed in the dark for 15 minutes, then transferred to a 16x76 mm polyallomer ultracentrifuge tube and heat sealed. The sample was then centrifuged at 60,000 rpm for 18 to 20 hours at 20°C in an ultracentrifuge. The DNA band was visualized with a long wavelength UV transilluminator and extracted using a syringe. Ethidium bromide was removed from the DNA by several extractions with butanol saturated with dEbO. The DNA was precipitated by adding 2 volumes of dH20 and 6 volumes of 95% ethanol, placing the sample at -20°C for 1 hour to several days and centrifuging for 10 minutes at 2500 rpm at room temperature in a clinical centrifuge. The pellet was washed with 70% ethanol, centrifuged for 10 22 minutes at 2500 rpm at room temperature, the supernatant removed and the pellet air dried. The DNA was resuspended in 0.5 to 1 ml of TE8. /; STE 0.1 MNaCl 10mMTris»Cl (pH 8.0) 1 mM EDTA (pH 8.0)) e) Radiation Hybrid DNA Preps Cell hybrid and radiation reduced hybrid clone DNA was prepared for this project by Mike Schertzer, following the method of Blin and Stafford (1976). f) YAC DNA Preps YAC DNA was prepared for analysis from broth cultures using a modification of standard protocols (Burke, Carle and Olson, 1987; Brownstein et al. 1989; Bellanne-Chantelot et al, 1992). Yeast pellet plus 1 ml of supernatant from a 2-3 ml YPD culture were microcentrifuged down in a 1.5 ml Eppendorf tube for 2 minutes and supernatant discarded. 200 pi of sorbitol solution was prepared (0.9 M sorbitol; 0.1M Tris»Cl (pH 8.0)),' 20 ul of Zymolase (50-100 U; Seikagaku Kogyo Co., Tokyo), 20 pi of 4% v/v (3-mercaptoethanol). The pellet was gently resuspended with this 240 pi of solution, sealed with parafilm, incubated at 37°C for 1 hour, then microcentrifuged down for 2 minutes, discarding supernatant in a fumehood. 100 pi YTE (0.1M EDTA; 10 mM Tris«Cl (pH 8.0)) was mixed with 10 pi of 10% SDS and added to the tube. The tube was then incubated at 65°C for 20 minutes. 40 pi of alkaline lysis miniprep Solution III was added and the tube microcentrifuged for 3 minutes. The supernatant was retained, discarding the pellet. 900 pi of 95% ethanol was added and the tube microcentrifuged for 15 minutes. The ethanol was completely removed, the pellet dried then gently resuspended with 100 ul of TE8 containing 20-50ug/ml of RNase A. The solution was incubated for 1 hour 23 at 37°C. 100 ul of isopropanol was added, the tube microcentrifuged for 10 minutes, the alcohol completely removed and the pellet dried. The resulting pellet was resuspended in 100 pi of TE8. J) YAC DNA in Agarose Plugs YAC DNA was isolated and immobilized in agarose plugs for rare-cutter enzyme restriction digestion for PFGE, following modifications of a standard protocol (Anand, Villasante and Tyler-Smith, 1989). For PFGE experiments, a small (500 pi) aliquot of the source YPD yeast culture was stored in 15% glycerol storage at -80°C, to serve as a clone source to assess YAC integrity (i.e. clone identity, deletions, etc.). Prior to proceeding, the plug moulds were cleaned in 10% hydrogen peroxide for 5 minutes and rinsed well with dtfiO. The bottoms of the plug moulds were taped and place on ice. A 20 ml YPD yeast culture was collected in a 50 ml Falcon tube for 10 minutes at 2500 rpm in a clinical centrifuge and the supernatant discarded. 5 mg of high grade agarose (InCert LGT) was added to 500 pi of sorbitol solution (1.2M sorbitol; 20 mM EDTA) and the tube heated briefly to dissolve agarose and maintained at 42°C. 2.0 pi of p-mercaptoethanol was added to the tube. The yeast pellet was resuspended in a separate 500 pi of sorbitol/p-mercaptoethanol solution. The yeast cell suspension was warmed to 37°C and 2 mg (-100U) Zymolase (Seikagaku Kogyo Co., Tokyo) added. The agarose solution was quickly added to the yeast suspension, mixed gently with a cut (wide orifice) pipette tip and each well of the plug mould quickly filled up. Each plug well holds 100 pi so approximately 9-10 plugs could be made. The moulds were left on ice for 10 minutes. In a 2-5 ml Falcon tube, 1 pi of P-mercaptoethanol was mixed together with 1 ml of sorbitol solution and 1 mg Zymolase added. The tape from the bottom of the plug mould was removed and the agarose plugs gently pushed out of the mould, into the Falcon tube, using a blunt (bent) Pasteur pipette tip. The tube was then incubated for 2 hours at 37°C. The 24 supernatant was poured off and replaced with 1 ml of yeast lysis solution (0.1M EDTA; 10 mM Tris«Cl (pH 8.0); 1% N-Lauryl-sarcosine (sodium salt); 0.2% sodium deoxycholate) containing 1.0 p/ml of Proteinase K. The tube was incubated overnight at 45-50°C. The supernatant was poured off again and replaced with 1 ml of yeast lysis solution. Resultant plugs were stored at room temperature. Prior to restriction digestion, residual Proteinase K was removed by 2x 15 minute rinsing in 1 ml of wash solution (20 mM Tris-HCl (pH 8.0); 50 mM EDTA) followed by 1 hour incubation in a wash solution containing 1 mM PMSF, followed by 2x 15 minute rinses in wash solution. The plugs were then washed twice (15 minutes) in 1 ml of storage solution (2 mM Tris-HCl (pH 8.0); 5 mM EDTA). These plugs could also be stored, in storage solution at 4°C for up to 6 months. g) DNA Pools Multi-dimensional pooling of a library of cloned DNA - indexed by plates sets, plate number, rows and columns of 96 well microtitre plates - provides an efficient means for screening of a large collection of clones by DNA analytical experiments such as PCR (Heard et al, 1989; Green and Olsen, 1990). Such a strategy was exploited in this project for isolating gene or STS clones from cosmid and (commercial) YAC libraries (see "Genomic Map Refinement" below). The alkaline lysis DNA mini-prep protocol was followed except that plate sets of independent 1 ml cultures in 96 tube boxes were grown, pooled using a multi-channel pipette and collected by centrimgation in 50 ml Falcon tubes. In addition, initial culture pellets were washed in STE and post-Solution III supematants were subjected to phenol/chloroform extraction. Final DNA preparations were stored in TE at -20°C under a drop of chloroform. 25 3. DNA Restriction Enzyme Digestion a) Restriction Enzyme Digestions Standard restriction enzyme digestions of DNA in solution used 50 ng to 2 pg DNA, lx restriction enzyme buffer (generally specified and supplied in lOx stock buffers of a composition set by manufacturers), 0.1 mg/ml bovine serum albumin (BSA, from lOx stock), de-ionized water to volume and 1 to 2 units/pg restriction enzyme in 20 to 30 pi total reaction volumes. Most digests were performed by incubation of reactions for 1 hour at the temperature (generally 37°C) recommended for the enzyme. Reaction products to be run in electrophoresis gels were first stopped by the addition of 1/4 - 1/3 volume of stop buffer. Digest products subject to additional manipulations (ligations or double digests) were stopped by the addition of 2 volumes of 95% ethanol followed by microcentrifugation, rinsing in 70% ethanol, drying of the pellet and resuspension in dEbO or TE8. i) Stop buffer (lOx) 0.25% bromophenol blue 0.25% xylene cyanol 40% w/v sucrose in dfhO 60 mM EDTA b) Restriction Enzyme Digestion of DNA in Agarose Plugs The protocol for restriction of DNA in agarose plugs (Burke, Carle and Olson, 1987) was a modification of the regular restriction digestions as follows. Approximately 1/4 of a washed (detergent-free, Proteinase K-free) plug (25 pi) was cut and immersed in 1.0 ml of 10 mM Tris»Cl (pH 7.5) buffer (without EDTA) in a 1.5 ml Eppendorf tube and equilibrated on ice for 30 minutes, with periodically agitation, to dilute plug EDTA. The buffer was removed leaving the plug at the bottom of the tube. Plugs were then equilibrated a further 30 minutes on ice with 100 pi of restriction endonuclease buffer (enzyme-specific, as per manufacturers' 26 recommendations). This buffer was replaced with a fresh 100 pi aliquot of restriction buffer containing restriction enzyme (20 units for 4 hour incubation; 5 units for 16 hour (overnight) incubations). The plug was incubated overnight at the recommended restriction temperature. After digestion, the restriction buffer was removed. Plugs requiring double digestion in incompatible buffers were first rinsed in 1.0 ml of 10 mM Tris»Cl buffer (pH 7.5) on ice, with periodic agitation, for 30 minutes. Plug equilibration with the new buffer without the enzyme was then performed followed by the digestion with the second enzyme in fresh buffer, as described above. Reactions were stopped by incubating the plug with 1.0 ml of PFGE lxTBE buffer for 30 minutes on ice, inverting the tube periodically, prior to loading the plug into the PFGE. 4. Agarose Electrophoresis Gels a) DNA Size Standards For normal (non-PFGE) agarose gels of DNA, either Haelll digested cbX174 (Bethesda Research Laboratories; BRL) for expected fragment sizes from 72 - 1358 bp) or Hind III/SstII digested lambda (BRL) DNA (for size range 125 bp - 20 kb) were employed as size standards. For PFGE long range gels, the mega-YAC host strain, AB1380; intact S. cerevisiae YPH80 yeast strain chromosomes (Life Technologies Megabase I,); and/or concatenated Lambda DNA markers (Life Technologies Megabase II, Lambda cI857 Sam7 concatamer) were employed as size standards. For some PFGE, the host yeast DNA acted like an internal size standard. Intermediate fragment sizes on all gels were estimated from gel migration distance plotted proportional to the logarithm of fragment size (for linear molecules). b) Ordinary Gel Electrophoresis Diverse electrophoresis apparatus were employed throughout the project to run agarose gels of various concentrations (0.4 - 2.0 %, depending upon fragment sizes of interest), from low 27 voltage (10 V) overnight, to high voltage (120 V) short period (30 minute) gels. Ethidium bromide was added to the gel buffer, or by post-run stain/destaining, and DNA visualization generally by an ultraviolet transilluminator with photography by Polaroid film or digital camera. When DNA quantities were low, radioactive probe screening of Southern blots (Southern, 1975) was additionally employed for DNA detection, where noted in this thesis. c) Isolation of DNA using DEAE Paper Some experiments relied upon the precise isolation of specific DNA bands of interest from restricted clone DNA. For this, band isolation was undertaken using Diethyl aminoethyl (DEAE) cellulose paper isolation techniques (Dretzen et al, 1981). Briefly, the protocol was as follows. First, normal ethidium agarose gel electrophoresis was employed to separate the restriction fragment(s) of interest. An attempt was made to limit DNA exposure to the UV light to minimize DNA degradation. Next, an ethanol/flame sterilized razor blade was used to make an incision in the gel directly ahead of the leading edge of the band of interest, about 2 mm wider on each side of the band. Using ethanol/flame sterilized (Millipore blunt end) forceps to hold apart the walls of the incision in the gel, a second forceps was used to insert into the slit a piece of DEAE-cellulose (Schleicher & Schuell NA45) paper, pre-cut to the width of an electrophoresis band and the thickness of the gel. The blunt end forceps were removed to close the incision, so as to seal the paper in the gel while ensuring the air bubbles were not trapped. Electrophoresis was resumed at maximum voltage (80-100 volts) until the band of DNA migrated into the DEAE paper (ca. 15-20 minutes). Next, again using ethanol/flame sterilized forceps, the DEAE paper was removed from the gel and immersed into 700 pi of 1 M NaCl solution in a 1.5 ml Eppendorf tube. The tube was then incubated for 45 minutes at 65 °C (in a water bath). The DEAE paper was removed and examined under UV transilluminator to confirm 28 that the DNA was completely eluted, then discarded. 700 pi of isopropanol was added to the saline supernatant and the tube incubated overnight at -20°C. The tube was then microcentrifuged for 15 minutes and the supernatant carefully discarded. Optionally, the pellet was washed with 70% ethanol and recentrifuged. After drying the pellet, it was taken up in a suitable volume of distilled water for further use (e.g. 14 pi, for ligations). During later stages of the thesis project, the "QIAEX II" (Qiagen) DNA band purification kits were also used to isolate DNA fragments for subcloning. d) DNA Fragment Purification DNA fragments separated on a normal agarose gel were isolated to reasonable purity (i.e. for use as a probe for hybridization) by secondary electrophoresis in Low Melting Point (LMP) agarose. Using an ethanol/flame sterilized scalpel, a gel block, containing the fragment (band) from the original electrophoresis run, was excised and transferred to a 1% Low Melting Point (LMP) agarose ethidium gel for electrophoresis at low voltage (60 V). The resulting purified DNA band was excised from the LMP gel and immersed in TE8 in a small tube. After boiling the sample for 5 minutes, it was stored at 4°C. e) Pulse Field Gel Electrophoresis Pulse Field Gel Electrophoresis was employed to separate both intact and restricted YAC DNA chromosomes of very large molecular weight, embedded in agarose blocks. An LKB "Pulsaphor" PFGE apparatus with a hexagonal electrode array was employed for these gels, with conditions as noted for specific experiments (see Results section). Briefly, the general protocol was as follows. Adequate supplies of lOx TBC buffer and autoclaved dFLiO were prepared. The apparatus tank was rinsed well with dE^O. The gel support, casting gasket and comb was washed with dishwashing detergent and water, then rinsed well with dEbO. The gasket and comb were 29 rinsed with 95% ethanol. 2.25 litres of lxTBE was prepared by dilution of lOx stock with autoclaved dhfiO. 150 ml of buffer was used to prepare a (0.8 -1.0%) agarose gel (without ethidium bromide). The remainder of the buffer was poured into the PFGE tank and pre-cooled in the tank for about an 1 hour while casting the gel (target temperature: 11-15°C). Some low melting point (LMP) agarose was prepared in lxTBE. The PFGE gel was cast on a level platform, with the gel comb parallel to buffer flow slots in the platform. The DNA agarose blocks to be run were loaded into the gel wells using an ethanol flame sterilized spatula. The wells contained a bit of buffer to facilitate loading and any bubbles under the plug in the well were removed using a pipette tip, prior to sealing the plug in the well with (liquid) LMP agarose. The gel was placed in tank and the apparatus assembled. Run conditions were programmed in, the buffer cooling loop set flowing, and the voltage turned on for the gel run. Gels were typically run for 24 hours to 3 days at 115-170 volts, with fixed or interpolated (ramping) pulse rates (shorter for small fragments; longer for large fragments). For gel runs longer than 24 hours, the tank TBE buffer was replaced with fresh buffer. i) PFGE TBE Buffer (1 Ox stock) Per 1000 ml of buffer: lMTris base (121.lg) 1M Boric acid (61.83 g) 20 mM EDTA dFLiO to volume Autoclave 20 min @15 psi/230°C 5. Radioactive Detection of DNA a) Nick Translation Probe Labeling Nick translation labeling of DNA results from the DNA polymerase synthesis replacement of "cold" dNTP's, removed by a DNA endonuclease (DNase I), with labeled dNTPs. Complex DNA sources (e.g. Alu PCR probes from cell hybrids) were labeled with [aJT>]-dNTP using the nick translation labeling method using a commercial kit (BRL) as follows. The 30 radioactively labeled nucleotide was selected (e.g. [ocJ2P]-dCTP). Approximately 156 pM of radiolabeled dNTP per 50 pi reaction was used. To a 1.5 ml Eppendorf tube were added and mixed the following reagents, on ice: 5 pi of solution A (e.g. A2 for [a32P]-dCTP), 1 ug of probe DNA, 50 uCi of [a32P]-dNTP and dH20 to give a total volume of 45 ul. 5 pi of Solution C was added, mixed gently, microcentrifuged briefly, then incubated at 15°C for 60 minutes. 5 pi of Solution D (stop buffer) was added. Unincorporated nucleotides were removed by a G-25 Sephadex spun column (see "Random Oligo Probe Labeling" below), leaving the probe in the eluted buffer. i) Solution A 0.2 mM of dNTPs less one (A2: no dCTP; contains dATP, dGTP, dTTP) 500 mM Tris-HCl (pH 7.8) 50 mM MgC12 100 mM 2-mercaptoethanol. ii) Solution C DNA Polymerase I 0.4 U/ul 40 pg/ul DNase I 50 mM Tris-HCl (pH 7.5) 5 mM Mg-acetate 1 mM 2-mercaptoethanol 0.1 mM PMSF iii) Solution D Stop buffer: 300 mM Na2EDTA (pH 8.0) b) Random Oligo Probe Labeling DNA melted into single strands may be labeled by random primer directed synthesis of new complementary strands containing some labeled dNTPs. Probe DNA was labeled with [oc32P]-dATP using the random primer method (Feinberg and Vogelstein, 1984). DNA samples were diluted to approximately 1 ng/ul and boiled for 10 minutes, then placed on ice. A standard labeling reaction consisted of 30 ul (30 ng) boiled DNA, 10 pi OLB-A, 5 ul 1 mg/ml BSA, 1 31 unit of the Klenow DNA polymerase fragment and 50 pCi of [a32P]-dATP. Reactions were held on ice until the addition of the [a P]-dATP, then incubated at room temperature overnight. The labeling reaction was stopped by the addition of 1 volume NTSB, and the unincorporated nucleotides removed by passing the reaction mixture through a Sephadex G-25 spin column. Spin columns were made by placing a 1 ml pipette tip into a collar formed by cutting the bottom and lid of a 1.5 ml Eppendorf tube, then placing the collar and tip into a 12 x 75 mm culture tube. The bottom of the tip was plugged with silanized glass wool and filled with G-25 Sephadex equilibrated in 1/5 TE (pH 8.0). The column was spun in a clinical centrifuge for 2 minutes at 1000 rpm and then transferred to a fresh culture tube. The sample was added and the column spun again for 2 minutes. The column was washed with 2 volumes of 1/5 TE and spun again. Unincorporated nucleotides are trapped by the Sephadex gel but the labeled probe excluded by the gel, thus the probe was collected in the eluted buffer. The activity of the probe was monitored using a Geiger counter. i) OLB-A Solutions A:B:C mixed in the ratio of 100:250:150 ii) Solution A 1 ml 1.25 M Tris (pH 8); 0.125 M MgC12 18 pi 2-mercaptoefhanol 5pll00mMdTTP 5 pi lOOmMdGTP 5 pi lOOmMdCTP iii) Solution B 2 M HEPES (pH 6.6) iv) Solution C Hexadeoxyribonucleotides (Pharmacia) suspended in TE at 90 OD/ml 32 v) NTSB 20 mM EDTA 2 mg/ml salmon sperm DNA 0.2% SDS c) Primer (End) Probe Labeling Primers employed in STRP detection (see Genomic Map Refinement below) were 5' end labeled using [y P]-dATP as follows. Primer stocks were pre-boiled before use to inactivate phosphatase present in the solution. The following reagents were combined per 10 pi volume of labeled primer: 2 ul of 5x T4 kinase buffer, 1 pi of T4 kinase, 5.0 pi ofthe pre-boiled 10 uM primer stock solution and 2.5 pi of [y32P]-dATP label into a PCR tube. The tube was incubated at 37°C for 45 minutes, then boiled the tube at 95 °C for 5 minutes. The primers were stored at -20°C until required. d) Colony Lysis/DNA Binding Protocol A modification of the method of Grunstein and Hogness (1975) for bacterial colony growth and lysis directly on hybridization membranes for probe hybridization was undertaken as follows. A circular (Petri dish sized) nitrocellulose blotting membrane was labeled using an appropriate (persistent) membrane marking pen (Schleicher & Schuell) to include the date, name of organism(s) and the intended locations (or a coordinate system) of the sites of organism inoculation (step 2) on the membrane. The inoculation sites were gridded at least 10 mm apart to avoid culture growth overlap. Holding the membrane with forceps, it was carefully overlaid on the surface of the agar plate, while avoiding the trapping of air bubbles under between the membrane and the agar. The source organisms was inoculated directly onto the blotting membrane at the labeled inoculation sites using a flame sterilized wire loop inoculator, an autoclave sterilized wood toothpick (for large numbers of colonies), or equivalent inoculator. Only a pinpoint spot of 33 inoculum was required, to avoid colony overgrowth. When isolation of specific clones on the basis of colony hybridization results was intended, a non-membrane plate replica was inoculated with identically gridded inoculum, concurrently with membrane plate inoculation. The plate was incubated overnight upside down at the designated temperature (generally, in a 37°C incubator) to grow the organism(s). The replica plate was stored at 4°C under parafilm seal. Following incubation, the membrane was carefully removed from the surface of the plate using clean forceps and placed, colony side up, onto a piece of blotting paper (Whatman 3MM) presoaked with alkaline denaturation solution (1.5 M NaCl, 0.5M NaOH). The membrane was left on the blotting paper for 4 to 8 minutes, or until the colonies became moist and syrupy, with a shiny aspect. To avoid "bleeding" of the colony DNA, care was taken to keep the denaturation solution under the membrane and away from the colony side of the filter. The membrane, colony side up, was transferred onto a piece of blotting paper presoaked with neutralization buffer (1.5 M NaCl, 1M Tris»Cl pH 7.4) and left there for 3 - 5 minutes. The membrane was then rinsed for approximately 1 minute in 2x SSC solution (10 fold dilution of 20x SSC - see below). Next, the membrane, colony side up, was placed upon a dry blotting paper, to air dry for up to 30 minutes. Then, the membrane was placed between 2 sheets of blotting paper and baked at 80°C under vacuum for 1 lA to 2 hours. Baked membranes were stored dry in a plastic (hybridization) bag until use. i)20xSSC 3 M NaCl 0.3 M Sodium Citrate (pH 7.0) 34 e) Southern Blotting A modification of the method of Southern (1975) for transferring electrophoresis gel DNA onto a membrane for probe hybridization was undertaken as follows. Subsequent to running and photographing a given agarose electrophoresis gel, the gel was trimmed at the wells and to adequate dimensions to capture all lanes and anticipated band sizes, then a piece of Hybond N+ membrane (Amersham) was cut and labeled to these dimensions. For gels with bands over 10 kilobases in size, a partial acid hydrolysis was performed in 0.25 N HCI for 15 minutes at room temperature, followed by a dH20 rinse. All gels were then denatured in 0.6 M NaCl, 0.4 N NaOH at room temperature for 40 minutes and rinsed in dH20. The gel was neutralized by two consecutive 20 minute rinses with 1.5 M NaCl, 0.5 M Tris-HCl (pH 7.5) buffer. Rinse the gel in 20x SSC (3 M NaCl, 0.3 M Sodium Citrate (pH 7.0)). While the gel was soaking in the neutralization buffer, the blotting apparatus was prepared. Two strips of blotting paper "wicks" (Whatman 3MM or equivalent) were cut to the width of the gel and long enough to span the glass plate used to support the gel, with ends immersed into the lOx SSC buffer poured in a Pyrex dish. Three (3) other pieces of blotting paper slightly were cut slightly larger than the membrane dimensions. These three pieces and the two wicks were pre-soaked in the lOx SSC solution. The two wicks were positioned on the glass plates, an additional dry blotting paper was cut to gel size. When the gel neutralization wash was completed, the gel was placed upon the surface of the wicks, squeezing out any air bubbles from under the gel. The Hybond N+ membrane, with all edges and notch in alignment with the gel, was carefully overlaid upon the gel. A glass pipette or rod was used as a roller to eliminate air bubbles from under the membrane. The three wet blotting papers, then the dry 35 blotting paper, were overlaid upon the gel. The Pyrex dish was sealed up to the edges of the gel and membrane sandwich with Saran wrap or equivalent water proofing. A pile of dry paper towels was placed over the dry blotting paper surface, weighed down with a light weight (glass plate or equivalent). The blot was left to develop for 8-24 hours, substituting paper towels when wet (if possible). The apparatus was then disassembled and the flattened gel checked for complete DNA transfer by illumination under UV. The membrane was briefly rinsed with distilled water to remove any agarose and the membrane placed between 2 sheets of blotting paper. The membrane was baked at 80°C under vacuum for IV2 to 2 hours and then stored dry in plastic (hybridization) bag until use. j) Hybridizations Radiolabeled probe hybridizations of colony or Southern blots were as follows. Membranes were pre-incubated with hybridization solution at 65°C for 1 to 4 hours to block non-specific probe binding. This step is primarily required for complex DNA sources such as genomic or somatic cell hybrid blots. Labeled probe was denatured by boiling for 5 minutes, followed by quick chilling on ice. Repetitive (human) probes were pre-annealed with a vast excess of sheared non-radioactive human (e.g. placental) DNA. Following heat denaturation of the probe, a suitable quantity of radiolabeled probe (e.g. 50 pi containing about 30 ng labeled probe) was combined with 150-200 pg (15-20 pi of 10 pg/pl stock) of sheared non-radioactive human DNA with 5.5 pi of 25x SSC (final buffer concentration of about 2xSSC). The probe was then boiled for 5 minutes at 95°C and pre-annealed for at 65 °C for 15 minutes to 1 hour (nominally 40 minutes) prior to addition to the hybridization bag. 36 Approximately 5 - 10 ml of pre-warmed hybridization solution was added to hybridization bag containing the membranes to be probed. Using suitable precautions against radioactivity contamination of the environment, the probe was added to the hybridization bag, then sealed completely using the bag sealer. The bag was then incubated within a Tupperware container in the water bath, preset at the desired hybridization temperature (65°C for most hybridizations; 55°C for (GT)n hybridizations). The hybridizations were generally left overnight (18- 24 hours). After hybridization, the membranes were removed from hybridization solution and the solutions disposed in hot sink, or retained for another hybridization, if desired. The membranes were then washed as follows: • For "normal" hybridizations: Low stringency wash: 5 minutes at room temperature, in lx SSC, 0.1% SDS; High stringency wash: 45 minutes @ 65°C in pre-warmed 0.2x SSC, 0.1% SDS; If substantial background signal was observed, the high stringency wash was repeated with fresh buffer. • For (GT)n hybridizations: Low stringency wash: 45 minutes at 55°C, in lx SSC, 0.1% SDS. The membrane was briefly air dried on blotting paper, but kept moist if membrane stripping and re-probing were anticipated. The radioactivity of the membrane was monitored by Geiger counter to decide upon duration of autoradiography. The membrane was then wrapped in Saran wrap and taped securely inside a film cassette. A suitable (PDB-1) X-ray film was inserted (in a darkroom) into the cassette and exposed to the membrane for a period of time depending upon signal strength. When intensifier screens and low signal strength was anticipated, the cassette was placed in a -70°C freezer for the duration of the exposure. The film was developed as prescribed by the film manufacturer. 37 i) Hybridization Solution Basic formulation: 6x SSC; 0.3% SDS, 5x Denhardt's Solution; (For 500 ml: 150 ml of 20x SSC; 15 ml of 20% SDS; 50 ml of 50x Denhardt's solution) For ( G T I n hybridizations: use basic formulation as shown. For all other hybridizations, add sheared salmon sperm DNA to a final concentration of 100 pg/ml(50ml of stock) ii) Denhardt's Solution For 500 ml: Combine 5g of Ficoll; 5 g of polyvinylpyrrolidone (PVP40); 5 g of BSA (Pentax Fraction V); dH20 to 500 ml; Filter sterilize through a disposable micropore filter and store 50 ml aliquots at -20°C. g) Repeat Hybridizations of Blots Membranes can generally be re-hybridized with a series of probes if certain precautions are taken to keep the membrane moist during or after hybridization or washing. Such membranes were stripped by treatment at 45°C with 0.4N NaOH for 30 minutes, followed by neutralization (O.lxSSC; 0.1% (w/v) SDS; 0.2M Tris-HCl (pH 7.5)) for 15 minutes. Autoradiography for a normal exposure time was used to confirm that the membrane has been properly stripped. 6. Subcloning a) Ligations Generally, 14 pi of the dH20 dissolved DNA digested fragment (generally, DEAE isolated, LMP purified) was mixed with 4 pi of 5x ligation buffer (supplied by ligase enzyme manufacturer), 1 pi of 10 ng/pl complementarity digested Bluescript II KS (Stratagene) vector and 1 pi (1 Weiss unit /pi) of T4 ligase enzyme (Gibco-BRL), giving a total reaction volume of 20 pi. The reaction was generally incubated overnight at 16°C. Positive/negative ligase control reactions with 1 pi of Bluescript vector only as the DNA substrate were also performed. 38 b) Transformations Ligations were transformed into bacterial hosts as follows. Labeled 15 ml Falcon tubes were cooled on ice, one per each ligation product to be transformed (including +/- ligation controls and pNEB193 (2 ng/pl supercoiled plasmid transformation control). 50 - 100 pi of freshly thawed DH5a transformation competent cells (BRL) were added to each tube. Suitable volumes of ligation mixtures were gently added to their respective tubes (1 pi of pNEB control plasmid) and the tubes incubated for 30 minutes. The tubes were then heated for 45 sec at 42°C and placed back on ice for 2 minutes. 400 pi of LB broth was added and the tubes incubated 45 minutes at 37°C. Transformation cultures were plated out in 120 pi aliquots on suitable plates: ligation products on XIA plates, controls on ampicillin plates. White colonies on XIA plates were picked as putative subclones containing inserts disrupting the galactosidase activity of the Bluescript vector. Colony counts were taken to provide estimations of transformation efficiencies. i) XIA Plates LB Agar with: 62.5 ug/ml X-gal (5-Bromo-4-chloro-3-indolyl-P-galactopyranoside) 150 ug/ml IPTG (Isopropyl-B-D-thiogalactopyranoside) 50 ug/ml Ampicillin 7. Polymerase Chain Reactions (PCR) a) Standard Non-Labeled Reactions Specific PCR conditions (e.g. T m and MgCb) were as noted in the Results section in this thesis; however, unless otherwise noted, standard PCR 25 pi reaction conditions generally consisted of 10-100 ng of DNA in PCR buffer (50 mM Tris-Cl (pH 8.3), 0.02% Nonidet P-40 and 0.02% Tween 20), 10 pmol of each primer, 1.0 - 2.5 mM MgCl 2 , 200 uM of each dNTP, 39 and 0.2 U Taq polymerase (BRL) or VentR (exor) DNA polymerase (New England Biomedical; NEB). Standard PCR cycling conditions were 35 cycles of 94°C for 30 seconds, 58°C for 30 seconds and 72°C for 1 minute. b) Long Range PCR For some "long range" PCR reactions spanning the Caenorhabditis elegans cosmid locus F18C5.2, the "Expand" Long Template PCR System kit (Boehringer Mannheim) was used with buffer system #3 (detergents, 500 pM of each dNTP), following the protocol recommended by the manufacturer. c) Primer Labeled Reactions STRP detection involved PCR reactions using end labeled primers, with the following protocol modifications. DNA templates and (unlabeled) primer were boiled briefly to inactivate phosphatases. The PCR pool was prepared on ice without the end labeled primer, the latter added immediately prior to loading the reaction into the PCR machine. PCR run conditions were as noted in the result sections for each experiment. Upon completion of the reaction, PAGE stop mix was used to arrest the reaction and the reactions stored at -20°C until run on a sequencing gel (see "Sequencing" below). Since the reactions are 32P end labeled, they could be stored for several weeks prior to use. i) PAGE Stop Mix Per 10 ml: 10 pi of lONNaOH (lOmMNaOH) 9.5 ml of 95% formamide 0.05% bromophenol blue 0.05% xylene cyanol 40 8. Sequencing Sequencing during this project was undertaken using a modification of the Sanger enzymatic dideoxy nucleotide (ddNTP) termination sequencing technique (Sanger, Nicklen and Coulson, 1977). The sequencing reactions of mini-prep DNA was performed using materials from a commercial source ("Sequenase" 2.0; United States Biochemical Corporation), with sequencing products radiolabeled using [a S]-dATP for sequence ladder visualization by autoradiography of polyacrylamide sequencing gels. Briefly, the sequencing protocol was as follows. a) Template Preparation 12 pi of (phenol/chloroformed cleaned or equivalent) template DNA was aliquoted into a 500 pi (non-silanized "Click Seal") microtube and 6 pi of dH20 added. 2 pi of 2 N NaOH/2 mM EDTA solution was added and mixed by pipetting. The tube was then incubated for 5 minutes at room temperature. 2 pi of 2 M ammonium acetate (pH 5.4) solution was added and mixed by pipetting. 45 pi of 95% ethanol was then added and the tube shaken well to mix. The tube was microcentrifuged at 4°C for 15 minutes. While spinning template, "Pool" reagents (see Pool Preparation below) were prepared. All liquid was drawn off and 100 pi of 70% ethanol added to the tube, which was then microcentrifuged at 4°C for 5 minutes. All the ethanol was removed and the pellet thoroughly dried down in a SpeedVac for 8 minutes. b) Pool Preparation While spinning template, the [cc35S]-dATP was removed from the freezer and thawed out. Four 500 pi clickseal tubes per template were labeled with T, G, C and A. dGTP labeling mix was prepared consisting of 1/5 (for "far" sequencing) to 1/10 (for "near" sequencing) dE^ O 41 diluted stock solution. Pool reagents were prepared (Table 1) and stored for under 45 minutes on ice. Sequenase reaction buffer, ddNTP's and stop mix were thawed on ice. Table 1. Sequencing Reaction Pool Reagents (quantities in microlitres) # of templates sequenced: 1 2 3 4 DTT: 1.025 2.05 3.075 4.1 dGTP labeling mix dilution: 2.05 4.10 6.15 8.20 [a35S]-dATP Label: 1.00 2.00 3.00 4.00 DMSO: 0.5125 1.025 1.5375 2.050 Sequenase Enzyme Dilution Buffer: 1.8 3.6 5.4 7.2 c) Sequencing Reaction 6.5 pi of dFfiO was added to template DNA pellet and dissolved by pipetting, without introducing bubbles. 0.5 pi of 10 micromolar of selected primer oligonucleotide was then added to the tube, followed by 1.0 pi of DMSO. The template tube was boiled for 3 minutes then placed into liquid nitrogen for 5 minutes. The Sequenase Reaction Buffer was equilibrated to room temperature during this time. Tubes were thawed out, by finger warmth, to point of solution clarity, then immediately spun down briefly and 2 pi of Sequenase Reaction Buffer quickly added by brief, gentle mixing. The tube was then incubated at room temperature for 5 minutes. During the incubation, 2.5 pi of each ddNTP (C,T,G,A) was placed into its corresponding labeled tube. 0.25 pi per template of Sequenase was added to the Pool reagent tube. Once the 5 minute room temperature incubation of the template was completed, 6.3 pi of Pool reagents (w/Sequenase) was added to the template tube, mixed briefly and the tube incubated 4 minutes. After 3-1/2 minutes, a 3.5 pi drop of the above mixture was placed on the side of each of the labeled ddNTP-containing tubes. At 4 minutes, the tubes were capped and spun down briefly. The tubes were then incubated at 37°C for 4 minutes. Upon completion, 4 pi of Stop Mix was aliquoted to each tube, the tubes capped 42 then spin down briefly. The sequencing reactions could be loaded immediately into a sequencing gel or stored -20°C for 2 weeks or so. 3-4 ul of a given reaction were run per long range sequencing gel lane. d) Polyacrylamide Gel Electrophoresis (PAGE) PAGE gels were employed in this thesis project for both 35S based sequencing product and 32P based STRP analyses. A PAGE gel was prepared as follows. First, 25.25 g of urea was weighed out into an Erlenmeyer (250 ml) flask and 270 ml of dHbO added to the flask. A stock of 25% (w/v) ammonium persulfate (APS) was prepared (0.25 g APS added to 1.0 ml of dE^O in a 1.5 ml Eppendorf; stored at 4°C for short periods of time). 7.3 ml of 5x TBE (per litre: 54 g Tris base, 27.5 g boric acid, 20 ml 0.5 M EDTA (pH 8.0) or 3.75 g of EDTA disodium salt) was dispensed into the flask, followed by 6.0 ml of (Ultrapure Bioreagent) 50% concentrate modified acrylamide gel solution. The solution crystals in the flask were then dissolved by hot water heating by flowing hot tap water around the flask in a sink for about 10 minutes. The gel apparatus was cleaned and assembled as required. Inner glass surfaces were silanized periodically. Lower gel plug was cast by aliquoting out about 13 ml of gel solution and simultaneously adding 200 pi of APS and 52 pi of TEMED into the gel solution, swirling gently then pouring the plug. Let plug harden. 106 pi of APS and 26 pi of TEMED were added to the remaining gel solution. This solution was mixed by swirling, then dispensed into gel apparatus using a 25 ml mechanically suctioned pipette, in a manner ensuring that no bubbles are created in the gel. The square edge of shark tooth comb was then inserted into the top of the gel and covered with residual gel solution. The sequencing gel was left to set for about 1 hour in a slightly inclined position. 43 0.6x TBE buffer (90 ml of lOx TBE into 1410 ml dH20 - 1500 ml 0.6x TBE buffer) was prepared. The bottom reservoir of the apparatus was filled with enough of buffer to just cover the bottom electrode. The cast gel was assembled into the apparatus and the upper gel reservoir filled with the buffer. The comb was removed and the top of the gel flushed out with 0.6x TBE. The upper electrode was placed into position and the power supply turned on at maximum voltage, with power dissipation limited to 55 Watts. Gels were pre-run without samples for about 45 minutes, until the gel temperature reached 50°C. At about 30 minutes of the pre-run period, sequencing reaction samples were heat denatured by boiling 5-10 minutes, followed by quick chilling on ice. The apparatus power was turned off, the top of the gel flushed again with buffer, then the shark tooth end of comb inserted to just touch the edge of gel. Using 0.5 - 10.0 ml micropipettor with special (long, flat) gel loading tips, 3-4 pi of sequencing reaction was quickly loaded per shark tooth comb well, avoiding air bubbles in the wells beneath sample. The upper electrode was replaced and the power supply turned back on. Gels were run for about 1 hour, until the lower gel marker reached the bottom of the gel. Often, a second set of sequencing reactions was loaded into the gel after 1 hour, and the gel run for an additional hour, to obtain extended sequence data from overlapping sequence ladder separations. Prior to run completion, a piece of blotting paper was cut of dimensions adequate to cover the gel. At run completion, the power supply was turned off. Then, the upper electrode and the shark tooth comb were removed, the gel apparatus disassembled and the run buffer discarded in the radioactivity sink. The upper glass plate was then carefully removed from the gel, leaving the sequencing gel sticking to the lower glass plate. The blotting paper was then carefully aligned over the gel and placed thereupon. The blotting paper was then carefully peeled back, with the gel stuck onto it, and placed, blot paper side down, onto the bench. The gel was then covered with Saran Wrap and the excess gel trimmed away. The gel was then dried in 44 the gel dryer for 25-30 minutes under vacuum at 80°C. The Saran wrap was then discarded and the gel monitored by Geiger counter for radioactivity. The gel was then placed in a sequencing autoradiography cassette and a suitable (Kodak PDB-1) X-ray film exposed for a 1 day to 2 weeks at room temperature, depending upon the extent of signal on the gel. This autoradiograph was then developed and the sequence ladder read. B. Genomic Map Refinement 1. DNA Sources The thesis project exploited several chromosome 8 specific or enriched resources of subcloned and/or segregated genomic DNA available in the laboratory of Dr. Stephen Wood. a) Chromosome 8 Cosmid Library A 20,000 clone, chromosome 8 cosmid library, LA08NC01, previously characterized by the Wood lab (Wood et al, 1992) was employed for STRP marker isolation and for detailed genomic mapping. This library consists of approximately 4 genomic equivalents of fluorescence-activated flow-sorted (Deaven et al, 1986) chromosome 8 DNA obtained from the human/hamster cell line UV20HL21-27 (containing human chromosome 4, 8 and 21). Flow sorted DNA was partially digested by Sau3AI and inserted into the BamHI cloning site of the sCos-1 vector. Cosmids were packaged and transformed into the E. coli hosts DH5aMCR. Individual kanamycin resistant colonies were picked and transferred into individual wells of 96 well microtitre plates with LB broth, grown overnight at 37°C, glycerol added to 40% and stored at -70°C. The resulting library was arrayed in a total of 208 microtitre 96-well plates. At the start of this thesis project, the 96 well microtitre plates of the LA08NC01 cosmid library were replicated and DNA pools established (see DNA Pool protocol above). The DNA pools consisted of a series of 26 blocks of 8 plates independently pooled by block, plate, row and 45 column number to make a 4 dimensional pool set for clone isolation by PCR screening with specific STRP markers and genes using locus-specific PCR amplification systems. Hybridization resources were also prepared for screening of the cosmid library. Using a Biomek 1000 robotic workstation (Beckman Instruments Inc. Fullerton CA), high density colony blot filters were fabricated by inoculating Hybond N+ membranes (Amersham) overlaid upon LB media from cosmid library clones arrayed in 96 well microtitre plates. The membranes were incubated until satisfactory colony growth is observed, then subjected to the colony lysis/DNA binding protocol. Each single high density filter contained all the clones from up to 16 microtitre plates (16 x 96 = 1536 clones), providing for efficient hybridization screening of libraries with i f thousands of clones. S labeled cosmid vector-specific probe served to highlight colonies in autoradiographs. The results of high density filter hybridizations were compiled using the grid display functionality of the database software, ACEDB. b) Chromosome 8 Cell Hybrid Panels Cell hybrids 1HL12, 20xPO435-2, 2N2 and 1E1 from published chromosome 8 hybrid panels (refer to Appendix 0; Wagner et al, 1991; Sapru et al, 1994) plus two additional radiation-reduced chromosome 8 hybrid clones, 50N and 501 (Kirchgessner et al, 1995), characterized by Mr. Mike Schertzer of our lab by STS content analysis, were employed as mapping resources to "bin" cosmid subclones into specific sub-intervals in the WRN candidate region bounded by D8S87 and D8S137 (Bruskiewich, Schertzer and Wood, 1997). c) Chromosome 8 YAC Libraries A chromosome 8 sub-library of the Imperial Cancer Research Fund (ICRF) genomic yeast artificial chromosome (YAC) library (Larin, Monaco and Lehrach, 1991) and a chromosome 8 subset of the Centre d'Etude de Polymorphism Humain (CEPH) genomic mega-YAC library (Bellanne-Chantelot et al, 1992) were employed for long range genomic map 46 construction. Contigs were constructed using YAC clone assignments to chromosome 8 by Genethon (Cohen, Chumakov and Weissenbach JA, 1993; Chumkov et al, 1995) and by STS content screening of a commercial mega-YAC DNA pool (Research Genetics), using markers assigned by this thesis project, or by others, to the WRN candidate genomic interval (Bruskiewich, Schertzer and Wood, 1997). d) CEPH Reference Family DNA DNA isolates from eight CEPH reference families (102, 884, 1331, 1347, 1362, 1413 and 1416; Dausset et al, 1990) were employed in this thesis project as reagents for linkage analysis. 2. Coarse Genome Mapping Strategy At the time this thesis project was initiated, the preferred mapping hypothesis regarding the WRN locus was that WRN was likely to reside in the genomic interval flanked by the genetic markers D8S87 and D8S137. At the start of the project, published genetic map data indicated an interval size of approximately 10 centiMorgans (Figure 3), a very large and intractable interval precluding immediate positional cloning of WRN. The initial research objective of this thesis thereby focused upon better characterizing this large genomic interval with new polymorphic linkage makers generated from two coarse genome mapping strategies. i) Candidate Gene Mapping Candidate gene hypotheses based upon biological reasoning may be employed to direct positional cloning of genes. This project examined the human GNRH (LHRH) locus under one such hypothesis. The resulting mapping strategy was to isolate a cosmid containing the LHRH locus by PCR screening of the chromosome 8 cosmid library using gene-locus specific primers. A LHRH gene containing cosmid was then subjected to (GT)„ repeat detection and isolation (see below), to identify a polymorphic STRP for genetic linkage mapping of LHRH. 47 oo o a o J3 U X o t; o 00 o o *-t—» <D c ca t>0 o & U •<* Os as a o co l-c =3 I ° s ° Z ^ 2 c o a o CO c « o Q PL, a ± oo • .3 J 3 CL, CL, CL, Os Os 3 o > .3 .3 pi — CS « 2 a. s 03 <0 00 OO Q Q °j00 00 £ 0 0 0 \ «n g\ oo oo oo 5S3»-/ <! PQ O CO CN - » 3 CL, T J <L> .s C3 c l l > CL, O CS § .2 H i I g 0 <u B e 1 IP a 5 O l-H e 55 * -33 » § " .c .3 © t r u oo , O .52 £ > c ° C S3 oo 5 u „» J3 i> a o S3 § ° « S 2 c p ^ o CO OH I OO 1 2 «J S ~ * * £ .3 x> <U CS 2 o o £ J 3 o (O O ^ c J2 JJ .2 < & % ^ ' g o a & r^-   MOT; ' en C <D >- co . v - -s-S ca .2 Ta oo ii) Somatic and Radiation Cell Hybrid Mapping In this thesis project, an alternative coarse mapping strategy employing somatic and radiation cell hybrids was also used to identify cosmids residing in the candidate map interval between D8S87 and D8S137. First, the cell hybrids were assayed for framework marker content. Next, human inter-Alu based polymerase chain reactions (inter-Alu PCR; Nelson et al. 1989) were performed upon the selected somatic and radiation reduced hybrids using Alu-element end specific primers, ALE1 (sequence 5'-GCCTCCCAAAGTGCTGGGATTACAG-3') and ALE3 (sequence 5'-CCACTGCACTCCAGCCTGGG-3') (Cole et al, 1991). Each 25 pi PCR reaction contained approximately 100 ng of source DNA amplified under standard conditions. 35 cycles of 94°C (30 sec), 62°C (30 sec) and 72°C (1 min) were run for each reaction. Inter-Alu PCR product pools were labeled by nick translation with 50 uCi of [a-32P] dATP, annealed with a vast excess (>100ug) of total human genomic DNA and hybridized overnight at 65°C in standard hybridization solution to high density colony blot filters of LA08NC01 clones. Initially, inter-Alu PCR products from hybrid clones 1HL12 and 20xPO435-2 (Wagner et al, 1991) were employed as hybridization probes against high density filters of the entire chromosome 8 cosmid library to identify chromosome 8p clones. This subset of cosmids was further limited by screening with inter-Alu PCR products from a radiation-reduced somatic cell hybrid, 50N (Kirchgessner et al, 1995), which was negative for known STS probes distal to D8S137. High density filters were prepared from cosmids positive for all three hybrids and subjected to additional hybridization screens. Additional inter-Alu PCR products from two novel higher resolution, hybrid panel clones, 2N2 and 1E1 (Sapru et al, 1994) were used to probe the library, plus a second radiation hybrid clone, 501 (Kirchgessner et al, 1995), thought by STS content assay to be further restricted in chromosome 8p content relative to 50N. Cosmids testing 49 positive in this screen for non-repetitive inter-Alu PCR products were retained as an enriched sub-library of clones tentatively localized to the region near D8S339. 3. Genetic Linkage Mapping a) STRP Marker Development Potential (dCdA)«(dGdT) simple tandem repeat (STR) sequences of length n>15 were detected by hybridizing a radiolabeled oligo (GT)n probe (Pharmacia), consecutively, to colony blots of cosmids; of Southern blots of (generally EcoRI) restriction digested cosmids run on agarose gels; and of colony blots of Bluescript II KS vector (Stratagene) subcloned (GT)n positive bands from cosmid blots. Small size EcoRI fragment (GT)n positive bands were subcloned by DEAE paper isolation, ligation and transformation into EcoRI digested Bluescript vector. Resulting clones were plated on XIA plates and picked transformants (white colonies) gridded for colony blots, that were screened by hybridized with labeled oligo (GT)n probe. For cosmids with EcoRI fragment (GT)n positive bands sizes too large to subclone, labeled oligo (GT)n hybridization of Southern blots of single and double restriction digest gels with a panel of other Bluescript cloneable enzyme sites was undertaken. Small positive bands for other enzymes were DEAE isolated and subcloned, or shotgun subcloned. In either case, small subclones (less than 400-600 bp in size) were directly sequenced from each end. For larger subclones, a restriction map were constructed from gels of single and double digests with a panel of enzymes, followed by Southern blotting for labeled oligo probe hybridization, to localize the (GT)n positive fragment. If the positive fragment maps near one end of another of the insert, then direct sequencing of the STR was attempted from that end. Otherwise, the subclone was subject to further subcloning with suitable enzymes, as indicated by the restriction map. 50 Where a suitable restriction map could not be directly constructed to localize the (GT)n positive fragment, an alternate strategy of (GT)n-specific degenerate primer PCR was performed. In this strategy, the Bluescript "Forward" KS and "Reverse" SK vector primers were combined in separate PCR reactions with each of a set of 6 (GT)n-specific degenerate primers designed to anneal to one or the other end of a given oligo (GT)n sequence, with specificity based upon the 3' terminal nucleotide, specifically: 5'-(GT)n [ACT]-3' and 5'-(TG)n [ACGJ-3', where [] denotes the selection of one of the three dNTP's indicated. Standard PCR conditions were employed in the reaction, with 1.5 mM MgCb and 30 cycles with a Tm of only 40°C. STRP candidate plasmids were subsequently sequenced from both ends in order to directly read the inferred STR sequences and to devise flanking primer sequences for STRP amplification in various genetic systems. The map location of candidate markers was validated by localization upon a chromosome 8 hybrid panel (Wagner et al, 1991) and by linkage analysis (see below). CEPH mega-YACs (Bellanne-Chantelot et al. 1992) containing the polymorphic STSs were also identified, by screening of YAC library DNA pools (Research Genetics Inc.). b) Genotyping of Polymorphic Markers into CEPH Families Candidate STRP's were genotyped into the reference CEPH family panel by radioactive end labeling of one or the other primer (0.125 uCi of [y P]-dATP label to 10 pM of primer in a 20 pi reaction volume), lul of labeled primer per individual reaction was employed in each PCR amplification of family DNA. Cosmid and plasmid control amplifications were also performed. PAGE analysis and autoradiography of PCR products detected individual alleles based upon repeat lengths. A P labeled sequencing reaction of a vector plasmid (M13mpl8) was employed as a size standard for allele repeat length estimation relative to the cloned repeat sequence length isolated in the original candidate cosmid. 51 c) Linkage Mapping The linkage mapping computer software program CRIMAP (Lander and Green, 1987) was employed to analyze CEPH family genotype data from STRP analyses and to construct a genetic linkage map of the various markers employed in this thesis project. Loci were positioned on a CEPH reference genetic map using two-point and multi-point linkage analysis to genotypes in Version 7.1 ofthe CEPH database (Dausset et al., 1990). 4. Physical Mapping a) STS Content Screening CEPH mega-YAC and LA08NC01 cosmid DNA pools were screened for locus content using various STSs mapped to human 8p by Genethon (Cohen, Chumakov and Weissenbach, 1993; Gyapay et al, 1994; Chumkov et al, 1995), the MIT/Whitehead Institute (Hudson et al, 1995) or within the Wood lab. An STS content map was constructed using a combination of direct PCR and cosmid-derived probe hybridization experiments against mega-YACs. The computer software SAM (System for Assembling Markers; Soderlund and Dunham, 1995) was employed to construct and visualize YAC contigs. b) Long Range Restriction Analysis PFGE analysis of rare-cutter restriction products from YACs was also performed. Agarose plugs of YACs were restricted in single and double digest reactions with 5-20 units of AscI and NotI enzymes (New England Biolabs) for 4 to 16 hours, under manufacturer's recommended conditions. Digested plugs were then in a 1% pulse field gel at 170 volts for 36 hours, with 60 - 120 second interpolated ramp of alternating NS and EW pulses. Resulting gels were visualized by post-run stain/destain with ethidium bromide, then acid-nicked in 0.25 N HCI and Southern blotted onto Hybond N membranes (Amersham). Blots were stripped to permit sequential hybridization with several probes. 52 c) Transcript Mapping DNA pools of LA08NC01 were screened by PCR to obtain clones containing the given STS or gene locus, in order to obtain locus-associated DNA probes serving as transcript mapping reagents. In some situations, locus-containing cosmids were identified in the course of other hybridization screens in the Wood lab and also served as DNA source for probe development. d) Hybridization Probes Pulse field gel Southern blots were probed with total human genomic DNA and with single copy hybridization probes developed from cosmids positive for loci under consideration. Hybridization of Southern blotted restriction digests of locus-positive cosmids, using radiolabeled total human genomic DNA, yielded non-repetitive (non-hybridizing) single copy restriction fragments for use as locus-associated probes. All probes were radiolabeled for hybridization with a P-dATP using the random priming method. C. Bioinformatics Applied to the WRN/DExH Gene Family In the third phase of this thesis project, bioinformatic techniques were applied to the comparative genomic analysis of the WRN/DExH family of DNA helicases. 1. Hardware A portion of the work was undertaken on a Sun Sparc 20 system running Sun OS 4.0 UNIX or Sparc OS. The remainder of the work (e.g, ACEDB for Windows development work) was undertaken upon Intel microprocessor based personal computers running (in historical sequence) Windows NT 3.51, Windows '95; Windows NT Workstation 4.0; or NT Server 4.0 operating systems. 53 2. Software a) ACEDB for Windows ACEDB for Windows is a Visual C/C++ Microsoft Foundation Class (MFC) based port of the Unix ACEDB to Microsoft Windows '95/NT 4.0 undertaken by this candidate during his thesis research period. The final product (summer 1998) is based upon the Enterprise Release 5.0 version of this development system. Some script programming was also undertaken in Perl (both UNIX and Windows NT versions). ACEDB for Windows, in conjunction with a shareware screen capture utility, was employed to generate all the ACEDB related screen dumps in this thesis. b) ACEDB for Gene Family Analysis In addition to creating the port of ACEDB onto Microsoft Windows, an attempt was made by this doctoral candidate to design and implement novel generic (non-platform specific) ACEDB functionality supporting the gene family analysis undertaken in this thesis. To this end, a novel ACEDB graphic display, the "Dendrogram" tree display, was conceived, designed and implemented using the generic capabilities of the ACEDB graphics library and database kernel, in a manner also permitting its operation under platforms other than Windows (i.e. UNIX). This novel database functionality was employed to represent taxonomic and phylogenetic tree data generated in this project. Other smaller modifications of ACEDB to support gene family analysis were also undertaken and existing ACEDB functionality employed where indicated to support data management during the thesis project. c) Web Site Construction The "HelicaseWeb" WWW site was designed using Microsoft Frontpage '98 and published to a Microsoft Internet Information Service (IIS) 4.0 web server running under Windows NT Server 4.0 on an Intel-microprocessor-based workstation. An attempt was made to 54 develop a WWW query page designed to interface to a Windows-based WRN helicase ACEDB database, "HelicAceDb", using ACEDB for Windows client/server functionality. d) Specific Genomic Databases and Analysis Software Bioinformatics in the project employed public domain genomic analysis tools currently available on the WWW. In some cases, WWW server resources were used directly off the web. In other cases, public domain software packages were downloaded by FTP from the Internet and rebuilt on MS Windows directly from the source code using Visual C/C++. For certain packages (ClustalW for MSDOS), some troubleshooting and modifications of the source code were required in order to successfully build and run the programs. Table 2 lists some of the principal WWW based resources employed in data analysis. Table 3 lists public domain software packages (other than ACEDB) installed locally for analysis. 3. Data Analysis a) WRN Bioinformatics Analysis BLAST (Altschul et al., 1990) and T'-BLAST (Altschul et al., 1997) were employed for various database searches against the "non-redundant" Genbank database and other more specialized databases (e.g. C. elegans genome project databases), "gi" accession numbers of sequences exhibiting significant BLAST HSP hits were extracted from the BLAST output file using a custom Perl script (""). A Pearson FASTA-type (Pearson and Lipman, 1988) library file of amino acid sequences was retrieved from Genbank using the "gi" accession number list file based retrieval facility of NCBI Batch Entrez ( 55 Table 2. WWW Resources Employed in Bioinformatics Analysis Resource Description URL Citation BLAST Database search engine (various variants) Altschul ef al., 1990,1997 TESS Transcription Element Search Software Schug and Overton, 1997 BLOCKS Database of conserved protein motifs Henikoff and Henikoff,1994 Blockmaker Tool to generated a blocks database from a set of sequences postulated to be related Henikoff et al. 1995 Table 3. Locally Installed Software for Analysis Name Description Source Citation Remarks ClustalW Multiple sequence alignments and phylogenetic tree building EBI-EMBL Thompson, Higgins and Gibson, 1994 Program required debugging and source code repair to get running under MS Windows MACAW Multiple sequence alignments and presentation NCBI Schuler, Altschul and Lipman, 1991 Local alignments constructed interactively by user. The FASTA-type library file was then input into the ClustalW multiple sequence alignment (MSA) program (Thompson, Higgins and Gibson, 1994) built and run locally on a Windows NT 4.0 computer system. Multiple alignments of subsets of sequences were also performed interactively using MACAW (Schuler, Altschul and Lipman, 1991) running on the Windows PC. This latter program provides for interactive construction of multiple alignments using regular expression (Kleene, 1956) searching for motifs, segment pair overlap (Karlin and Altschul, 1990) or Gibbs sampling (Lawrence et al, 1993). Sequences suspected by visual inspection of MSA's to be duplicated in the database were manually aligned and inspected using MACAW; in cases of sets of duplicated sequences, a single representative sequence of the set was retained. ClustalW was also employed to generate phylogenetic tree files based upon the Neighbor-joining method of Saitou and Nei (1987) based on a matrix of "distances" between all sequences. Several multiple alignments were restricted to particular fixed width sequence regions centred upon conserved motifs. A Perl script was designed to extract the specified 56 sequence windows for alignment. Several iterations of ClustalW alignment were typically undertaken upon the sequence sets. ClustalW multiple sequence alignment output (in gapped FASTA format) also served as input for phylogenetic inference by Bayesian evolutionary tree estimation (Sjolander 1998a,b) based upon Dirichlet mixture priors (Brown et al. 1993; Sjolander et al, 1996). In addition to phylogenetic tree construction, this latter technique provides statistical measures for highly conserved amino acid residue positions in the multiple sequence alignment. Pairwise sequence comparisons of candidate gene ortholog pairs was undertaken with BLAST2 (BLAST comparison of two sequences) and ClustalW (using "slow" dynamic programming mode). 57 III. RESULTS A. Genomic Map Refinement in the WRN Candidate Region 1. Genetic Linkage Map Construction Characterization of genetic markers in the vicinity of WRN was sought using two approaches: a candidate locus mapping approach (LHRH) and a random chromosome wide marker search (hybrid inter-Alu product screen for 8p region specific STRP containing cosmids). a) Linkage Mapping of LHRH The STS primers for LHRH were 5 '-CCTTGTCTGGATCTAATTTGATTG-3' and 5'-TCACCTGGAGCATCTAGGGTACA-3' (exon #2 derived primers, Nakayama et al, 1990) with amplification of a 305 bp product optimized at 2.0 mM MgCb. Human 8p assignment was confirmed by positive amplification of the STS in MGV281, a human 8p positive hybrid; the negative human 8q control was MGV271 (Figure 4). STS screening of DNA pools of the cosmid library initially identified a candidate LHRH (exon 2) containing cosmid, 37F12. A candidate STRP was tentatively isolated in a 6.6 kb PstI fragment from 37F12; however, 37F12 appeared to be unstable and difficult to characterize. A second experiment was attempted using the LHRH exon#2 STS PCR product as a hybridization probe against high density colony blot filters of the LA08NC01 cosmid library (Figure 7). This second experiment yielded a second candidate cosmid, 145G1 (Figure 8), validated by PCR with the LHRH exon #2 system. Candidate STRPs in 145G1 were then identified by oligo (GT)n probe screening of the cosmid and derived shotgun subclones using various enzymes. One polymorphic STS of structure 5'-(CA)23T2A3(CT)n-3' was isolated from a 0.6 kb PstI shotgun Bluescript subclone of cosmid 145G1 (Figure 9). The corresponding PCR system designed from sequence data (Figure 10) is (CA strand) 5'-GACTTATCCTCCTTGTTTCCC-3' 58 and (GT strand) 5'-AT A A AGG AC AGTC ATTCTGG AG-3', run at a Tm of 58°C with a MgCl2 concentration of 2.0 mM. The cloned allele yields a 237 nucleotide product upon amplification. All eight CEPH families were informative (sample data for family 884 shown in Figure 11), giving 10 additional alleles ranging in size from 219 bp to 243 bp, plus one smaller allele of 187 nucleotides (Table 4; Appendix C). The allele frequencies (Table 4) estimated from 54 independent alleles in the eight CEPH families give a calculated heterozygosity of the marker of 0.86. Pairwide LOD scores for LHRH against the reference set of 8p markers were calculated (Table 5(a)), with no recombination observed between LHRH and D8S5, with a Z m a x of 9.33. Multipoint linkage analysis against the CEPH 7.1 reference alleles (Table 5(b)) placed LHRH with equal likelihood in either of the genetic intervals flanking D8S5, bounded proximally by D8S137 and distally by D8S136. 59 Figure 4. Confirmed LHRH Exon #2 STS Localization to 8p « LHRH Exon# 2 (by PCR Mg[ll] Titre) D8S135(-)-o 1.0 mM 1.5 mM 2.0 mM 2.5 mM ctrl rxn — < £ < < < < <-305 bp PCR reactions were run on a 2% gel; 8p and 8q hybrids with negative D N A control; D8S135 used as positive control for PCR pool reagents. L H R H exon #2 system titred with M g 2 + to determine optimal concentration; 305 bp expected L H R H product. Figure 5. Primary STS Screening of LA08NC01 Cosmid DNA Pools for 37F12 305 bp -> LA08NC01 primary D N A pools amplified with L H R H exon #2, conditions as noted in text; D N A free negative control, D8S135 positive control for PCR pool reagents; MGB281 8p hybrid positive control for L H R H system. 60 Figure 6. Secondary Screen for 37F12 in LA08NC01 Cosmid DNA Secondary screen of LA08NC01 D N A pools, conditions as noted in text. Plate positive amplification result shown. Figure 7. LHRH Exon#2 Hybridization against LA08NC01 Colony Blot 1 " X < C 1 $ f (9 Um LT-145G1-> High density filter of cosmids arrayed by Biomek robot onto a membrane, grown and lysed by a standard colony blot protocol, then hybridized under standard conditions to a labeled probe derived from the L H R H exon #2 PCR product. 61 Figure 8. EcoRI Digest of LHRH Exon#2 Positive Cosmid 145G1 (3 samples) 15kb-10 kb-3.6 kb -3.2 kb -3.1 kb -2.4 kb -1.1 kb-1.0 kb-0.85 kb -0.75 kb -Three independent D N A mini-preps of cosmid 145G1 were digested with EcoRI then run on a 1% agarose ethidium gel. 62 Figure 9. Digests of (GT)„ Positive 0.6 kb PstI Fragment of 145G1 CD T J T3 re PstI Hindlll << Pvull iitf Rsal 600 bp -> D N A mini-preps of three identical subclones containing a 0.6 kb PstI fragment of cosmid 145G1 were restricted with the enzymes indicated. Figure 10. Portion of a Sequence Gel for the LHRH STRP T G C A A portion of the autoradiograph of a 35S-labeled sequencing gel of the cloned L H R H allele, showing the complex repeat structure. Lanes are labeled with the ddNTP used in the sequencing reaction. 63 Figure 11. Representative Autoradiograph of Genotyping Gel for LHRH STRP X CO Ti- m o es © •4 4 CO 00 00 00 co X co o I I co oo CO 00 00 © •4 CO CO o T3 00 E S CN CO CO CO 00 CO 3* 00 00 t t CO 00 CO CO CO CO CO CO 237 bp ^ 9 I ^^^^ t^ojHHI^ fe""* Sample autoradiograph of gel running out CEPH 884 family genotyping PCR reactions; Parental alleles not shown; cloned allele size is indicated but individual allele sizes not given here; family pedigrees and detailed data for all families genotyped are available as appendices to the thesis. Table 4. LHRH STRP Alleles Observed in All CEPH Reference Families Allele Length(bp) Observed Frequency 1 245 0.0 2 243 0.0185 3 241 0.0185 4 239 0.0741 5 237 0.1481 6 235 0.1111 7 233 0.1481 8 231 0.0370 9 229 0.0556 10 227 0.0 11 225 0.0 12 223 0.0370 13 221 0.2593 14 219 0.0370 15 to 29 217-189 0.0 30 187 0.0556 64 X Ctf p <D 03 OH 2 u CD t—1 r Q O S 3 c . o *-£> o CS <C C _o •S .3 X ) S o o <L> a > T 3 1) "2 o u o o <" O CD •3 ° • a * CD +J * § 3 • s 3 cn O 2 s CO 60 t/3 .3 - a o S3 a, co b 2 ° i> Cw frt fc-w » CJ .2 •3 I 2 N CD o o i n o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O O o o o o d d d d d d d d d d d d d d d d d d d d i n LO CO o CO CO Is-in Is-m CM CM co CO oo o in CD C D C D o o CO CM CD cq o m L O 00 m d T _ T _ T _ d CM CM CM T ~ CM CM T ~ T ~ \ — CM d o 00 CM a> in CO CD T— O J CM LO o CD oo 00 o o CD CM CD CD • 5 f CM C D CN in CO o in CD CD CO C D d CM CO CM CM 1— CM -cf in CO in •"^ CM CM CM x — CM m CO C D CM T— -cf CM CD CD co 00 CD m Is- o CM in CD o o CD 00 m CO CM CO CD CO CD o in CD d CO CO CO CD CO 00 CO CO CO co CM CO o CO •sr Is- •<r in 00 O Is- m CO CN CD CD CO CM CM in h~ CM 00 CO CD CD CD CO o in •<* CO CD d CO iri CO CM 00 00 CD CD d CO CD CO oo CO 0.25 CM 00 CO oo CO L O Is-Is-00 CD CD CO O CO CO Is-CD CM CM iri m d oo 00 d CM CD CD CN CM CM CO d co CO L O L O L O CD O CD h-CD C D o d CO CD CO co iri 0.20 o CO CD CD -Jf o CO Is- m CO 00 CD d CD iri in CM CD CD CD CO CM co co CM CO oo CD co CM CM CO d CO CD 0.10 m CN c\i CM Is-CO CO Is-cd CM oo co o CM iri CD CM o o CO d CO CO iri LO CM o CD CD LO in 0 0 CM CD oo d CD CD CO CM CM CO 0.05 T— d • L O CD d o o CM m oo 00 CO Is-L O d CD o CM in CO o oo d o CM C D o o CO 0 0 00 d o 00 CO CM CO 00 CD CO CO CO Is-iri 0.01 o CO LO 1 o co in oo CD oo CO CM CO Is-o oo CD CM CD O d T— CO CD d CD O CD CO CO CM CM CD oo in CD o 00 C D r-CD T— CN CO CD CM CD d CM CD CO 0.005 oo in CD • Is-CN CM • "<r Is-CO CM i CM CO 00 i CD Is-00 d 1 00 1^ i CO in CM CO CO L O CD CO CO CN r-CM co o co CD CD m CO 00 CO CM CM CD CN iri 1 in C D iri i CD CM i o d CO t co CD t co oo in i •<a-CO i— T oo co CM 1 in in T— 1 in CM CM i d T— 1 o LO CD co CD O CD L O CO d CN CO CD CD L O d i CD CO iri • in i co C D co CM 1 CO co • CO CD t— i X a N 00 CO 00 CO in CD 00 co CD CD o CD co Is-O CD iri CM CN CM co CD d -a-CO CD o o co CO CO CD o CM d T— CM CD co CM T— CD d CM CO max Is-CN L O CM 00 CM in CM m o CM CO CM o oo o o O O CD O o CD CO <X> d d d d d d d d d d d d d d d d d d d Marker o CN in CO oo Q o LO in CO oo Q m CD CM CO oo Q CM in m CO oo Q CM CO oo Q _ i _ i oo in CM CO oo Q CM oo CM CO oo Q o CO m CO oo Q oo CD CM CO 00 Q CO CO CO oo Q co CO CO oo Q LO CO oo Q co CO oo Q 0 0 CO 00 Q a: L L O LL m L O CM CO oo Q < < _ l CL T3 o CD <u o fl <u c-i—I CD X O H W u T 3 fl a X J 3 CD O a CD b0 CS o CL PH < s o o 0\ O N r~~ r-CO CO Q Q O O in b) Somatic and Radiation Cell Hybrid Mapping The Alu probe based cell hybrid hybridization screen undertaken by Mr. Mike Schertzer of the Wood lab resulted in the construction of a somatic/radiation hybrid map partitioning chromosome 8 into 6 regions (see Table 6 and Figure 12). A list of 98 candidate cosmids were hypothesized to reside in a genomic region designated as "CI" by somatic and radiation hybrid analysis, a region tentatively inferred to reside immediately distal to the marker D8S339. A subset of 37 of these clones exhibited positive hybridization to oligo (GT)n probes to colony blots and were selected for further analysis (Table 7 and Figure 14). c) Isolation of STRPs from "CI" Sublibrary Cosmids This "CI" sublibrary of 37 cosmids were restricted with EcoRI then run on 0.8% agarose gels (Figure 13), Southern blotted and screened (GT)n STRs (one representative blot shown in Figure 14). Standard STR isolation protocols were employed to isolate a number of candidate STRP's which were assessed by Wagner panel mapping and genetic linkage against the CEPH reference families. Work generally proceeding in parallel upon the entire subset of cosmids, to various stages of completion, prior to the decision being made in the spring of 1996 to terminate this thesis activity. The most closely linked marker, D8S2297, from this work was subsequently published in a comprehensive physical map of the WRN candidate region (Bruskiewich, Schertzer and Wood, 1997). The general summary of the outcome of the experiments is provided at the base of Table 7. STRP systems which were characterized from the experiments are listed in Table 8. 66 o O z 00 o < -»-> 1=1 "3 < a <u o o 1 (-1ti 3 T3 O >-. O H 1-co ^ CO + -a Yr o O (0 T5 o 5 0) re HI CD CD 00 C Q. — 3 re 2 O O CD CO oo m oo c O 3 C O re 5 o o o o 1 3 o O oo o < O e0 C O c JJ 3 T 3 — o (u .22 S 83 -S s •a & „ I'll. 3 U j " « ( 3 B <u <o s£<2 s . 2 -S . 2 - 2 o .2 & si 3 :s « o o <»-c o c £ P ^ ~ 13 T3, =s S t c I'M* 8 3 8 + u o <u y—' g ' g S3 £ O —j T3 .s3 b t £ o § •£ >. °> S « « "S " i « a <a D . T 3 3 e - 5 e -a U •a "> o Q. JJ ( 3 Sts * •a .2 C N 0 3 .S <o o iu 3 T 3 H o O, B .5 O Js £J _ 3 .2 < .3 .3 -a W O U C 1— I i in O o .o H o I o 3 c o T 3 s u 1> -a <L> a 5b o o Table 7. List of (GT)n Positive "CI" Sub-interval Cosmids LA08NC01 cosmids binned to sub-interval "CI" (Table 6 and Figure 12) are as indicated here, with a coarse band size assessment for strongest positive hybridizing candidate (GT) n band from autoradiography of EcoRI restricted cosmids (representative blot in Figure 14). Cosmid GT n Band Mapping Results/Remarks 6G10 Large Candidate 10 kb fragment subcloned (?) 7A9 Large 13A3 Large 30E12 Small 37C10 Small 37D5 Vector Sized Excluded; Mapped by others to 8q, at D8S344 41A8 Small STRP developed; Wagner panel mapped to 81 (8q24) 44C8 Small Candidate 2.4 kb fragment subcloned 45H5 Small Candidate 3.6 kb fragment subcloned 47H11 Large Excluded; Mapped by others to 8q, by GRINA FISH 53C3 Small STRP developed; Wagner panel mapped to 8pC (8p11.2-21) 57F4 Small STRP developed; Amplifies hamster DNS (+ CHO control) 81B10 Small STRP developed; Wagner panel mapped to 8H (8q22.2-22.3) 83D4 Small STRP developed; Wagner panel mapped to 8pC (8p11.2-21) 86F4 Small Candidate 2.0 kb fragment subcloned 91B6 Vector Sized Candidate 8.9 kb fragment subcloned 106H2 Small 108C11 Large Subcloned GT truncated by sCos vector 114H3 Vector Sized Candidate 7.4 kb fragment subcloned 124B8 Large 124H5 Vector Sized Candidate 2.0 kb fragment subcloned 128B9 Small STRP developed; Wagner panel mapped to 8pC (8p11.2-21) 128D5 Small 130G7 Small STRP developed; PCR system failed 137A9 Small STRP developed; Wagner panel mapped to 8A (8p22-23) 157D5 Vector Sized 157F6 Large 157H9 Small Cosmid fingerprint identical to 137A9? 160A3 Small Cosmid fingerprint identical to 137A9? 165C10 Small Candidate 2.4 kb fragment subcloned 172C6 Large 172H12 Small STRP developed; Wagner panel mapped to 81 (8q24) 176H7 Large 178B3 Small Candidate 6.6 kb fragment subcloned 192D11 Small Candidate 3.3 kb fragment subcloned 194D5 Vector Sized Candidate 8.7/6.5 kb fragments subcloned 197F7 Small Cosmid fingerprint identical to 137A9? 70 Table 7 (cont'd) Summary Statistics of (GT) n Positive "Cl" Subregion Cosmids Total Cosmids in "Cl" Sublibrary: 37 Excluded by other map results: ( 2) Duplicate Cosmids: ( 3) (by EcoRI I (GT)n fingerprinting) Remaining "Cl" Cosmids: 32 Candidate (GT)„ Subclones: 20 Candidate (GT)n Sequenced: 11 Fully Testable STRPs Developed: 9 STRPs assigned to Wagner "8C": 3 Genotyped/linkage mapped STRPs: 2 Figure 14. Southern of EcoRI Restricted "Cl" Cosmids Probed with Oligo (GT)„ The EcoRI restriction gels of Figure 13 were Southern blotted and hybridized to an oligo-GT„ probe with a standard protocol. Typically, the strongest signal represented the band to be subcloned for STRP development. 71 u s to L O P S en a> E L_ Q. TJ C to 4-1 CO H C3 O L O 0) E Q. T3 C (0 u. < o TJ £ to o o d) STRPs Mapping in the Vicinity of the WRN Candidate Region Only three candidate polymorphic STSs - characterized from the three independent "CI" region cosmids 53C3, 83D4 and 128B9 - were assigned to region "C" of the Wagner chromosome 8 hybrid panel. STRPs for cosmids 53C3 and 128B9 were subsequently genotyped against the CEPH reference families. Genotyping of the 83D4 STRP was not completed due to time limitations. i) STRP from Cosmid 53C3 A polymorphic STS (cos53C3PA) derived from a 3.6 kb EcoRI fragment of cosmid 53C3 in the region-specific cosmid sublibrary was characterized. Degenerate GT primer PCR suggested that a (GT)n STR lay within 150 bp of the reverse primer end of the Bluescript vector, so the insert was directly sequenced from this end. A dinucleotide repeat of structure (GT)2o was isolated and a PCR system devised as indicated in Table 8. Genotypes from 7 CEPH reference families were obtained. Estimated heterozygosity of the marker is 0.85, calculated from 44 independent CEPH family panel chromosomes exhibiting 8 observed alleles ranging in size from 82 bp to 96 bp (Table 9). Tight CRIMAP computed linkage was detected between D8S5 and cos53C3PA, with no recombination observed with a Z m a x = 9.03 (Table 10). As with LHRH, multi-point linkage against CEPH 7.1 reference alleles placed cos53C3PA with equal likelihood on either side of D8S5. Table 9. cos53C3PA STRP Alleles Observed in CEPH Reference Families Allele Length Observed (bp) Frequency 1 96 0.045455 2 94 0.159091 3 92 0.136364 4 90 0.159091 5 88 0.181818 6 86 0.181818 7 84 0.090909 8 82 0.045455 73 o in o o LO co CD CO CNI CN co l CD CO | | 0 0 CO CO CD CO CO co CO CD CNI ID T3 CD o CD 1 a CO o co o CD led co ii 51 co CM led I CN CD ICO CD O c o l CO CO CDI co ICN CO CD oo CN CN CD CD O a CD )-H C M CD cd m CM o CM o c o o CO c re c !5 E o o CD m o o o in co I co CD 0 0 cri o CN m oo cri CD oo CD CN co CD CO o ini ICO I CN o ICNI 0 0 CN CO CN CO o loo CO |CN CNI ID CN CO loo I CD 0 0 led I cn CNI o in o oo ICNI CO oo CO CD led I CN CO led I CN ICN co o CD CD CN l i d CD CO |0O co co in CD PH w u < PH cn U cn co C/3 O o CD O a CD 00 cd O vo CN CN cn cn oo oo CN CN Q O II Q O J M3 S9 oo 2? Q 9 m <! 00 PH oo c n P U ' on <r m PH X c n Q CO oo vo v i cn cn 3 &0 &0 oo oo O Q Q u c3 co m oo o o o o co CD CO : * L_ re S oo m CN CO co Q CD CO I CO oo Q loo CO oo Q CO co Q ii) STRP from Cosmid 128B9 (D8S2297) A polymorphic STS (subsequently registered as D8S2297) was derived from an 870 bp EcoRI fragment of cosmid 128B9. The relatively small clone size encouraged immediate sequencing, which was rewarded with early characterization of an STR with a cloned allele repeat sequence structure 5'-(GATA)6TA(GT)2iT(TG)4-3'. PCR primers and parameters devised to amplify this system are indicated in Table 8. The system was genotyped in CEPH families 102, 884 and 1413. The calculated heterozygosity of the marker is 0.828 calculated from 16 independent CEPH family panel chromosomes exhibiting 8 observed alleles ranging in size from 111 bp to 131 bp (Table 11). CRIMAP computed linkage was detected between D8S339 and D8S2297 with a Z m a x = 6.17 at recombinant fraction of 0.03 (Table 12). Table 11. D8S2297 STRP Alleles Observed in CEPH Reference Families Allele Length Observed (bp) Frequency 1 131 0.0625 2 129 0.125 3 127 0.0625 4 125 0.125 5 123 0.0 6 121 0.0 7 119 0.3125 8 117 0.0 9 115 0.125 10 113 0.125 11 111 0.0625 e) Integrated Genetic Linkage Analysis of Thesis STRPs An integrated genetic linkage analysis using CRIMAP was performed for all three STRPs characterized in this thesis (LHRH, cos53C3PA and D8S2297). The results of this analysis are presented in Table 13 and Table 14. 75 C N 0) T 3 <L> o T3 c a PH o CL) S - H 1-1 r--OH W u T J O N m oo •ti fl • f l PH Q O J i a o t3 03 •c B O .S-Si B o u CD w. C CD > '3> T 3 ID -a "> o LH a. o u c/i /—\ O <E> •S .2 ^ CO T T * CD -c-» 3^ S S • S 6 w O CO u ,11 CD CD CD U C CD (H .CD CD k-w u cu • 3 S E CD o *H « O C o .9 TJ e o CL, co CD C o o CO 03 a CD > '5b CD C M CM C O C M CD C D C D L O C M C M T J CL) ts O T J fl CL) OH ca o fl 0) kH M-H CL) PH w u T J 1—1 O N 1 C I o O O oo T-H C N C N r o O O O O oo C N C N C N II II II Q Q Q O O O h - l O O C N C O C O O O O O 9 9 r o r o co co 0 0 0 0 9 9 m i n C O C O ON c n m 00 00 Q 13 co CO CO o co t-i Pi m 00 CD o> NT co 00 CD CM rs. CO Oi o o CO CD ^ _ CD T ~ o LO o M - CO CO O CO oo CO co co CM CM CM T — O | S . is. in co T — o o o o o CD CD CD CD C O 00 •<—' o d d d d d d d d d d d d d d d d d CO CO co CO CO CO CD CD CD CO CO CO CO CO CO m in m in in in ii 1 II 1 II ii 1 II ii i II 1 II 1 II • II • II ii • n • II II i II • II i II ii • II • II Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q O O O O O O O O O O O O O O O O O O O O O _ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _j _ i _ i _ i _ i _ i _ i _ i ct ct LL LL CD CD LL LL t^ -00 00 CO CO oo 00 Q Q CD CD CO CO CO CO CO CO CO CO Q Q Is. h i o» o> CM CM CM CM CO to oo 00 Q Q 1^ - IS-co co T — CO CO CO 00 Q Q a. CO O O CO CO io v> O Vi o o o o in rr CO HI 00 Q • in i CO CO a: Q - j CD CO CO CO T — CO CO 00 CO Q Q ct ct LL LL CD CD LL LL 00 o> CO f N O £ • oo CD Q CO i CO h-C0 CO CO CO Q CO ^9 OJ CD CM CO CM CO CO CO 00 00 9 Q co co N. S. o> o> CM CM CM CM CO co 00 CO 9 9 ct ct LL LL CD CD LL LL i i rs. oo co co co CO oo 9 9 o> CD co co CO CO CO CO 00 00 Q Q i i r-- is. co co Ct Ct LL LL CD CD LL LL ts ^ O) Oi CM CM CM CM CO CO 00 00 Q Q 00 00 CO CO 00 oo 9 9 CD CD CO CO CO CO CO CO 00 CO Q Q I I h~ rs. co co Ct N. T - Is. O) f V O) CM i7 CM CM CM CO V ? CO 00 Li- CO Q is. Q 1 o> LL CO LL CD j o CD LL Q LL i i i rs. is. is. co oo oo co co co oo oo oo Q Q Q i i i CD CD O) CO CO CO CO CO CO CO CO CO CO 00 00 Q Q Q • i i rs. is. rs. co co co T - IS. ct o> rn ^ ••V oo is. Q °> ' c^S CO LL c o CD Q LL • i is- |s. 00 OO CO CO 00 00 Q Q • i CD CD CO CO CO CO CO CO 00 00 9 9 |s- |s. co co T - N. Ct 0> LL <N rn ^ 4" 0 0 is. Q O) ' 3 5 CO LL oo CD Q LL • i h- is. 00 CO CO CO OO CO Q Q • i CD CD CO CO CO CO CO CO 00 00 Q Q i i rs. is. co co T — T — tS ct or °> LL LL fN CD CD co LL- LL eg Is. is. Q CM CM I" CM CM LL CO CO LL oo oo CD Q Q LL I I I is. is. 1^ . 00 00 CO CO CO CO 00 CO 00 Q Q Q • i i CD O) CD CO CO CO CO CO CO CO CO CO 00 00 00 G O O • i i is- |s. rs. co co co Is 2 Lt C^|LL CD Is. O) CM v-d fn CM CO ^ CO 00 LL 00 Q Is. Q ^ O) ^ £ CM Ct LL CO CD c o LL Q LL i i i |s. |s. |s-00 00 00 CO CO CO OO 00 CO Q Q Q i i i CD CD CD CO CO CO CO CO CO CO CO CO CO 00 CO Q Q Q • i i rs. is. is. co co co CO CO oo co Q Q CO COco oo Q Q in CO oo Q i CO co CO CO in CO S 2 co O 2 8 Vi o U CD £ CO -J Q 2 co O CO io Vi o o io ^ to CO O CO o i -J I CD CO CO oo Q CO CO CO oo co oo 9 9 9 in r£ co co oo O Q CO io v> o u 2 CO O co CD CO T — CO CO Q I in to CO O oo V 9 CD CD CD CO CO CO CO CO CO CO 00 00 Q Q I in CO oo Q at s -J CO CO oo oo Q Q CO CO O O 2 co O fo LO Vi o CJ 3: •J I 2 ± -J • m CO CO CO CO 00 Q Q co oo O Q CO CO CO CO 00 CO Q Q Q Vi Vi CO o o CO O O Q CD CO CD co co co CO CO CO CO OO CO Q Q Q 2 3 8 Vi o cj CO CO co co I —J I in CO oo Q I 2 3 ?8 O oo O Q CD CO co co t 2 8 Vi o u i i CO CO CO co oo oo O D D CO CO co co Q Q ^ ^ QL Q : a: CO CO co O O O CO co 50 io 10 in Vi Vi Vi - o o u o 0 a: 1 in co 00 Q CO CD co co in CO 00 Q I CD CD CO CO CO CO co 00 Q Q CO CO O O CO CO 10 10 Vi Vi CO CO CO CO 00 CO 00 CO Q Q Q Q CO CO CO 00 co co O O O o 0 i O CJ i I t—1 CO cS 0 0 >—1 %—» fl 0) O <D 13 0 0 U U Q. CO a> re TJ a> re i _ a> > < x CO CO O) c ? | O C - <o — CO O) < 12 a> La CO S c CU u •2. < E ^ 3 o H CO o o i t . X ra E CD E ^ o •Q V) o o x ro E CD 0) > i ^ 3 o in o o 1*. x CO E CD x ™ 0 ) 3 CO CT O x | o> o CO g. CO CO re S oo oo LO CO oo Q a: 2 O co io </> o u co CO oo Q o> CN CM CO 00 Q c o c o CO CO Q oo CC — CO — L L Q L L 2. Physical Map By the autumn of 1995, genetic mapping in the WRN candidate region was reaching fruition in this thesis project and in other laboratories, such as Genethon (Gyapay et al, 1994). An additional genetic marker, D8S1055 (MS8-134), was reported which exhibited no apparent recombination with WRN (Ye et al, 1995). Additional markers were localized into the region by an integrated genetic and radiation hybrid map (Oshima et al, 1994) and linkage disequilibrium/haplotype studies (Yu et al, 1994). Concurrently, Genethon (Chumakov et al, 1995) published a first generation YAC contig map. An STS-based map of the human genome was published soon thereafter (Hudson et al, 1995). In addition, experiments in radiation hybrid and YAC STS content mapping within the Wood and associated laboratories (J. Trapman, personal communication) were forging a first order physical map across the short arm of chromosome 8, including portions of the WRN candidate region. It was thus reasonable in the thesis project at that time to progress beyond genetic mapping, and into physical mapping of the WRN candidate region. This "Results" section on WRN region physical mapping elaborates upon and updates data, which was being submitted for publication at the time that the positional cloning of WRN was announced (Bruskiewich, Schertzer and Wood, 1997). a) Markers Selected for Physical Map Construction in the WRN Candidate Region Genetic mapping data available by the autumn of 1995 suggested that D8S339, D8S1055, D8S2297 and a proximal marker D8S535 could be used to define a mega-YAC contig across the WRN candidate region. In addition, preliminary mapping data placed three candidate gene loci within the WRN region: GTF2E2, GSR and PPP2CB. An early hybridization experiment with a fourth gene, heregulin (HGL), thought (at the time) to reside in the region, was negative (data not shown), a result anticipating subsequent mapping data for this locus (Imbert et al 1996). 79 Primers for D8S339 (Thomas et al, 1993) and D8S1055 (Ye et al, 1995) were obtained (Table 15) to provide reagents for both the STS content screening of YACs and the isolation of LA08NC01 cosmids containing these loci. The cosmids 106C4 and 187E9 tested positive for D8S339 and D8S1055, respectively, by PCR screening of the LA08NC01 DNA pools. Screening of YAC DNA pools detected D8S2297 in YACs 844e2, 750e9, 807fll, 898el0 and 900b2. D8S535 was typed by STS content experiments by Mike Schertzer. D8S540 (AFM281yb9; GSR2), a STS marker demonstrated to reside in close proximity to the glutathione reductase (GSR) locus, was mapped into the candidate interval with tight genetic linkage with D8S339 (Table 15; Wood lab; Oshima et al, 1994; Yu et al 1994). For GTF2E2, Mike Schertzer identified a cosmid containing the locus during a general hybridization screen for AscI rare-cutter sites in LA08NC01 cosmids. Specific subcloning of the AscI site in this cosmid, 6H7, and sequencing of AscI, flanking DNA identified the 5' untranslated region of the GTF2E2 locus by sequence identity in a BLAST search of Genbank. Additional sequencing of a (GT)„ repeat in 6H7 was identified as the polymorphic STS D8S540 on the AscI flanking sequence side opposite to GTF2E2. The cosmid 6H7 positive for both GTF2E2 and D8S540 was determined to contain a NotI and AscI restriction site located within 20 base pairs of each other. PPP2CB was documented in GDB/OMIM as a signal transduction protein assigned to human 8p. The Whitehead/MIT group (Hudson et al, 1995) mapped PPP2CB within WRN region YACs by EST content mapping. Cosmids containing the 3' end of the PPP2CB locus were isolated using the primers for WI-7626 (Table 15; Whitehead Institute/MIT Centre for Genome Research). One of those cosmids, 155A7, spans the promoter of PPP2CB known to contain a NotI restriction site (Khew-Goodall et al. 1991). In addition, PPP2CB was verified by STS typing to reside in the mega-YAC 896f4, but not 788e4. Until the positional cloning of 80 WRN was reported, PPP2CB was considered in this thesis project as a potential WRN candidate gene based upon mapping and biological rationale. A plan was devised to fully characterize the gene. However, these experiments did not proceed beyond the isolation of WI-7626 containing cosmids by PCR screening of LA08NC01 DNA pools (cosmids 31H6, 44G9, 131G2, 141 A3 and 155A7; Figure 15; Southern hybridization of gel with WI-7626 not shown) and limited subcloning of the 155 A7 and 131G2 (data not shown). b) STS Content Map YAC DNA preps from -80°C accessioned glycerol stocks were grown up and DNA prepared. The retention of YACs in the cultures was assessed by PCR with YAC vector arm primers (data not shown) and with inter-Alu primers (Figure 16). STS and gene locus content data for YACs screened within the interval spanning D8S339 are summarized in Figure 17. The program "SAM" (Soderlund and Dunham, 1995) was employed to computationally assess candidate YAC contigs. Mike Schertzer kindly provided all of the GTF2E2, GSR and D8S535 STS typing data. c) PFGE of Whole YACs from the WRN Candidate Region Agarose plugs prepared for several mega-YAC clones in the vicinity of D8S339 were prepared and initially run uncut on a pulse field gel (set of YACs noted at top of (Figure 18)). A negative control was provided by the mega-YAC host strain, AB1380. Southern blotting of this initial gel of uncut YACs with a total human genomic probe revealed that only certain YACs were retained in these clones. These were identified as 798e4, 896f4, 780e6 and 936g4 (Figure 19). YAC chromosome sizes were generally as published by Genethon, although the YAC 936g4 appeared to be smaller than the anticipated 850 megabase size. Additional YAC agarose plug preparations yielded other viable YAC plugs for 763a7, 844e2 and 807fll (data not shown), as initially verified by PCR. 81 Figure 15. Restriction Digests of Cosmid Isolates for WI-7626 EcoRI EcoRi/Bgl1 Bgl1 Sst1 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 7 . 1 2 3 4 . . \ *' . v •mmm. • * • — mW\ -\ *** , .» : * - to* mm ffiBBF ' frfiTffi mm -* ~' . "* |g§g ' •' ' m mm mm Wt mm . ^ , v • ' . . mm . »~ mm* .* \ V * Leqend 1 -C131G2 2 -C31H6 3 - C44G9 4 -C141A3 5 -C155A7 I - Lambda sizing ladder 1% ethidium agarose gel of WI-7626 containing cosmids single or double digested with enzymes as noted above the figure. 82 e _o o 1 a o O CO _o 't/J i? P H --2 "E, S ON CN CN ora 0 0 D *H CO s CD -4—i &0 H co o Q. o « a - N u s •£> E o E CL H O < O E CO i n LO cn cn o CO co co a a CO i n OS O H CO | H P H < Q u < I 0 0 CD a a u a o O < z Q H-H o c o o IS 'fi > Pi U P H < i ^H CO 3 0 0 08ei.av -LL0£8Z 8BSW 1 1 ZO808 | + + 0 L 3 L 0 6 9308Z 01-P98Z ZBC9Z - . . . Jr.*:. C ' " 1 1 1 J3ppe~| v ) I J W968 + LL3H8 fB9C6 + L yzos 39ffr8 + 630SZ fr986Z (+) 69ZA9IAI + + + VNQ" 1111111111111 1 mm 110 } i i i i | i n i | i i i i | i i i i [ M i i | i i ( i | i i i i i i u i 120 ISfl 1 U 1 I-CA Illlj llilj M1IM] fen ' ITT o ft d o CD > CO CD < jo o ft a o o OH 0 0 d cd vo CN > O 5=0 X I T 3 cu -t— C3 o '-3 a c S eo CJ CD > o co GO O N m m 00 C O Q o '£ 'o •> u co u <: >« C M o O O oo H 00 o to bp c o oo Q 00 (/) U) CO IO o o Q CO tO CN CM O) N CD -t-> Q. 00 CN 0 0 T -1^ o T t [S.v> -vt o rs. 00 00 U) CO CO O) CO O) 00 CO CD o 00 rs. CO (O rs. CN CO IO CO CO CD _> t5 ao CD c ca o "a o = a U C < s B CD S> C X -C CD *3 co O CD o .ss OH - O CD ca S - a c a .SP s co c o ° x> g •a • ^  i> .ts — • CO F5 ° a, oo >- < O CO Figure 18. Whole mega-YAC PFGE 1.9 m b -1/6 m b -970 kb -680 kb -600 kb 550 kb • 440 kb 350 kb • 280 kb 280 kb • 210 kb o co CO T— CQ ( D C U O C D w - i i - T J o C O O T t U 3 IS- T 3 . cn l O I t T - C D O < N N C O C Q C O C O C M O 1 - T -^ T - r«. T - CN rr T-O ) - C CO T J O O U co co co co en in co r s c o n m i o c o T t T f c o J N O t O I N S t O C O S Intact mega-YAC clones (AB1380 host strain as a control) were embedded in agarose blocks and run in a PFGE gel for 36 hours at 114 Volts with 90-120 sec interpolating ramp of alternating east-west, north-south pulses in a L K B Pulsaphor Model 2015 hexogonal array PFGE apparatus. Figure 19. Southern Blot of Whole YAC PFGE Gel with Total Human Genomic Probe 1.2 mb 1.0 mb 640 kb 550 kb The PFGE gel in Figure 18 was Southern blotted and hybridized with radiolabeled total human genomic DNA. 85 d) Selection of YACs for Long Range Analysis Since chimerism, deletions and rearrangements are common in YACs, STS-content screened YACs were critically assessed for such problems both before and during long-range mapping experiments, using available (published) information and observed hybridization results. For example, the YAC clone 936g4 exhibited inconsistent fragment sizes in our Southern blot results for PPP2CB, including multiple bands of differing intensity (Figure 24) suggesting the presence of two related DNA fragments, likely in two distinct YAC clones in our agarose plug DNA. Moreover, the data suggested that the two clones contain overlapping deletions. Others (Chaffanet et al, 1996) observed this result. For this reason, 936g4 was excluded from detailed map construction. In contrast, the YAC clone 807fl 1 is recorded in the Genethon database (Chumakov et al, 1995) as STS positive for the chromosome 20 marker D20S490, suggesting that this YAC is a chimera. However, since 807fll hybridization data was consistent and supportive of map construction, data from this YAC was retained as a secondary source, with caveats as noted below. e) Long Range Restriction Map of the WRN Region Cosmid clones isolated by PCR screening of LA08NC01 by gene and STS specific systems were routinely screened for AscI and NotI sites by restriction and normal agarose gel electrophoresis. Agarose plugs prepared for YAC clones 763a7, 844e2, 807fll, 896f4, 936g4 and 780e6, were subjected to detailed analysis by long range restriction mapping, to construct a contig across the region. The results of hybridizations using the probes listed in Table 16 to Southern 86 blots of PFGE gels (representative gel shown in Figure 21) of AscI and NotI single and double digests of the selected YACs are summarized in (Table 17). Partial Southern blot data is presented in Figure 22 through Figure 26. The results include the hybridization of total human genomic DNA to AscI and NotI single and double digests of the YAC 896f4 (Figure 22(d)). An integrated long-range map (Figure 27) was constructed from the hybridization data, specific cosmid data and other observations reported here. Table 16. STS or Gene Loci and Associated Hybridization Probes STS or Gene Locus Locus Positive Cosmids Cosmid Fragment Used as Hybridization Probe D8S2297 128B9 1.1 kb Eco Rl fragment from cos128B9 D8S339 106C4, 128C10 3.5 kb Eco Rl fragment from cos106C4 D8S1055 31F11, 187E9 2.05 kb Eco Rl fragment from cos187E9 GSR 72A7, 198A7, 6H7 0.85 kb Eco Rl - Rsa I fragment from cos6H7 GTF2E2 6H7 0.7 kb Eco Rl fragment from cos6H7 PPP2CB, 3' to NotI site 31H6, 44G9, 131G2, 141 A3 and 155A7 346 bp WI-7626 PCR product PPP2CB, 5' to NotI site 155A7 2.8 kb Eco Rl fragment from cos155A7 The accuracy of size estimates for whole YACs cannot be fully assessed with the available Southern blot data since the sizes lie near to the compression zone for the PFGE runs. The accuracy of size estimates for smaller fragments is limited by the non-linearity of PFGE gel runs, which exhibit considerable band distortion. Size estimates were primarily derived from measurements of 896f4 (Figure 21(d)). The total size of rare-cutter restriction fragment sizes for 896f4 reported in this candidate's paper (Bruskiewich, Schertzer and Wood, 1997) is 1,140 kb (Note: Table 2 and Figure 2(d) of the paper rounded this value to 1,200 kb). In this thesis, a review of the Southern blots and consideration of the published data (Chaffanet et al., 1996; Yu et al. , 1996b) suggest that this size is 1,200 kb. Similarly, this work's estimates of the size of the double clone 936g4 are 700 kb and 600 kb, respectively, in contrast to reported values of 770 kb and 430 kb, respectively (Chaffanet et al, 1996). Figure 22 presents the hybridization data for the three probes for D8S2297, D8S1055 and D8S339. The D8S2297 probe hybridizes to a large 1,330 kb NotI fragment of the 1,400 kb YAC 844e2 but no other tested YAC. This result places D8S2297 at the distal end of the contig (Figure 27). The D8S1055 probe detects this same large 1,330 kb NotI fragment but also hybridized to large 1,000 kb AscI, AscI/NotI and NotI fragments from 807fll. In contrast, the D8S339 probe hybridized to a small NotI 70 kb fragment in 844e2 and the 1,000 kb fragments in 807fll. These latter results are consistent with the presence of a single NotI site at the proximal end of 844e2 which is polymorphic, mutated or deleted in 807fl 1. The probes for D8S339 and GTF2E2 both hybridized identically sized 220 kb AscI, AscI/NotI and NotI fragments in 896f4. However, although the probe for GTF2E2 also hybridized to AscI and NotI fragments from 780e6 practically identical in size to those observed in 896f4, the probe for D8S339 did not hybridize to 780e6 DNA. These results indicate that the D8S339 probe is located at the distal end of 896f4, about 220 kb from GTF2E2. Map construction further indicated that GTF2E2, GSR and PPP2CB form a gene cluster (Figure 27); Southern data shown for 896f4 in (Figure 22(d); Figure 26(a-f)). Hybridization of total human genomic DNA to NotI digests of YAC 896f4 (Figure 22(d)) detected a 140 kb NotI band hybridizing the total human probe at least twice as intensely as all the other bands. This result was interpreted as representing two identically sized, co-migrating NotI fragments since even though the GSR and 3' PPP2CB probes both hybridized to 140 kb sized fragments (Figure 26(c,d)), successful map construction requires that two 140 kb fragments independently hybridize to the probes. 88 f) Localization Hypothesis: PPP3CC? The hybridization pattern for the 3' PPP2CB probe in Figure 26(e), exhibits faint hybridization signals for bands larger than PPP2CB 3' probe containing bands. Unlike Figure 24, this blot of PPP2CB probe hybridization is not overexposed to show residual radioactivity from a previous probe; rather, these signals are PPP2CB probe related. At one point during experimentation with the PPP2CB 3' STS, WI-7626, PCR detected a larger than expected 1.6 kb product for the PPP2CB 3' STS (WI-7626) in cosmid cosl29El 1 (Figure 20). This suggested the hypothesis that a second phosphatase gene, sharing high 3' region sequence similarity, might be present in this region. OMIM/GDB data assigns one such related gene, PPP3CC (Calcineurin A3), to chromosome 8 (OMIM #114107) by hybridization of the gene's cDNA to human/hamster cell hybrids (Muramatsu and Kincaid, 1992). However, given that the PCR reaction result specified was not very reproducible and that this gene was ancillary to thesis research, this hypothesis was not further pursued experimentally. Figure 20. PCR with PPP2CB 3' STS (WI-7626) in Chromosome 8 Cosmids T3 T3 <r 1.6 kb (WI-7626) 346 bp -> 89 g) Map Data for Deleted YAC 936g4 Although the YAC 936g4 was not studied in detail during the construction of the physical map presented here, some interesting data is available. Assuming that the relative intensity of bands permits one to discriminate between the two 936g4 clones, then the smaller clone (with the larger deletion) is the more abundant clone, judging by hybridization intensity for all probes. Based upon hybridization data presented here, the sizes of the two derived clones, 936g4.1 and 936g4.2, are estimated to be 770 kb and 660 kb, respectively. The first clone size is in general agreement with Chaffanet et al, (1996). The latter clone appears larger by about 210 kb than the Chaffanet estimate of 450 kb. The current work does not provide sufficient data to estimate deletion sizes in these clones, but Chaffanet et al, (1996) estimates the smaller deletion at 280 kb, which gives a source YAC size of 1050 kb. Lacking any other source of information, this value is the working estimate to guide interpretation of hybridization data here. The following observations may be made of the two 936g4 YAC clones. GTF2E2 is absent in the smaller clone but appears to hybridize identically sized 420 kb NotI, NotI /AscI, and AscI fragments in the larger clone (Figure 23). This observation is in contrast to the 220 kb fragments observed hybridizing this locus in 896f4 (Figure 26(b)). Band sizes for D8S540 indicate that this locus and its flanking NotI /AscI sites are preserved intact in the larger clone, but absent in the smaller clone. D8S339 is absent from both 936g4 clones. Both the 5' and 3' probes for PPP2CB hybridize both 936g4 YAC clones (see below). Comparison of fragment sizes observed in other, intact YACs indicates that 807fll and 896f4, PPP2CB 3' (WI-7626) and their flanking NotI /AscI sites are well preserved in the larger clone (Figure 24). This suggests that the proximal boundary of the deletion of the larger clone likely lies less than 220 kb distal of GTF2E2. The fragment size of 420 kb also likely spans the distance from the distal ("left") end ofthe YAC, to the NotI /AscI site lying between GTF2E2 and D8S540. Moreover, 90 if the estimate for 936g4 canonical and large clone sizes are accurate, then the proximal boundary of the YAC would lie a mere 30 kb beyond the PPP2CB proximal NotI site The probes for the both markers D8S1055 and PPP2CB 3' (WI-7626) strongly co-hybridize to the same 450 kb fragment of the more abundant 936g4 clone. D8S1055 does not appear to hybridize to 896f4. The probe for the 5' end of PPP2CB, also appears to hybridize to its expected 40 kb NotI fragment in the abundant 936g4 clone but appears to have lost an AscI site (D8S540 and GTF2E2 are also absent). These observations seem to suggest that the proximal boundary of the smaller clone's larger deletion links up to the (3') proximal side of PPP2CB and removes the proximal NotI site detected by PPP2CB-specific probes. The clone has also retained no AscI sites and that one clone end lies just distal to 5' PPP2CB. The simplest interpretation might be that an inversion of the PPP2CB genome fragment occurred relative to the more distal fragment containing D8S1055. This D8S1055 data on 936g4 does suggest, though, that this locus may lie within 220 kb + 280 kb = 500 kb distal of GTF2E2, assuming that the YAC and deletion size estimates are accurate. h) Physical Mapping Summary The above work was completed in the spring of 1996, immediately prior to the publication of the positional cloning of WRN. Given that the identity of the WRN gene as a putative DNA helicase of the RecQ gene family was then ascertained, no further efforts were made to refine the physical map. Rather, thesis research proceeded onwards towards the functional characterization of WRN. Two approaches were taken for this task: analysis of WRN homologous loci within a model organism (Caenorhabditis elegans); and comparative genomic analysis of all known (ca. 1998) WRN homologous sequences using bioinformatics. 91 Figure 21. PFGE of AscI and NotI uncut, single and double digested YACs. 844e2 807f11 896f4 936g4 ** S 3 4 ? o o o o C W W O C W O T O C W W O C W W O 3 < < 2 3 < < 2 3 < < 2 3 < < Z Products of single and double digest ^sc/ and Notf rare-cutter restriction of mega-YAC clones embedded in agarose blocks were run in a PFGE gel for 24 hours at 170 Volts with 60-90 sec interpolating ramp of alternating east-west, north-south pulses in a L K B Pulsaphor Model 2015 hexogonal array PFGE apparatus. 92 CO CfH O o 23 6 CD 1 o 00 •9 CD C o T 3 O c CD CD CX X CD _<D "5. 1 2 a CD CD cx X CD CD > so CD a £5 CD N CO 3 o c CO O 2 CO x> c i .3 CD CD oo CO CO CD O O LD oo j o r--CN S 'Ct O ) o co c o CO CD * . J Q CO g cn ; -o o CN cn o oo ^ CN o a> o 3 co oo CD E >. N C LU CD O o o o CO CO LO CD o CN CN V |T3 CD !"S 2 o>i CD |T3| !"o CD | - s CO ! o>!' CD -a CD JT3 2 !"o CD CO 0 !T3 Io CD CD O CO CO o L O o CN i O i O I** i O o o CN i O o o CN !^ H o <! cn i co! CO! CO o o ! O ! o < CN I LU! CM! O I o < cn CO o •tf I L O ! CO co! Q : CO! O ! CN ! a.! 0.1 Q-! Lb! o < ca i o ! CM °-! Q-! a.! c o ! u < cfl u 2 =s U • C O b c3 W O c o c3 N CD o on C N C N CU hH 3 • -—( CD OO m o u r i I ION CO 8 -o 00 Q in LO o ^ KB CO O o CM CN CO 00 Q CO u - C O Di O. H o X J c u T J 0 CO S3 S U a >- 2 • - c3 1 .9 =3 g u 6 X ) op g 2 V3 a o a -a * - T o 3 6 a o a 2 .. o a 3 t3 2 J2 o -< a ON o >/-> GO O O Q 4 CN LU CN H O O O H. o < OJ -t-» o o LU o L L o a o o DO m C N op o) UON/IOSV co cn co cn oo o 00 o TT If) CO co Q o o o IO CM T CM r O N o> |JON/|0SV CO CD CO CO o 00 CM LU CN LL. r -o UON CQ U oi CH OH OH T3 U o s-c 0-u T 3 O o CO CO co cn llON/psv psv inoun H°N « UON/losv to § psv inoun | | HON ^ UON/psv i g psv * inoun H _ o o O O O O O LO •8 rs to in o > ZZ- Is- CO *t CM r- O I • • . . ™ CO CQ O CM CL CL CL e o CO no; CJ Jo o (H p -3 O > c j M C E O CJ CO "cO — pq a PL, OH o o GO C O O) to co CN JS llON/psv psv inoun HON UON/psv PSV inoun HON £ IJON/psv psv inoun cn co is O oo in m o CM CL CL CL c CO o EX X <u > > CO > CJ a _o CN MD r-in o oo Q . f l T D CJ O o < T D CO CD O w O OH o CU o 00 C N CU H O N jo |JON/psv on p s v i n o u n HON en oo j n o u n o LO HON «? u o N / l o s v o> l o s v »noun HON g H O N / l o s v en 0 0 |0SV i n o u n XJ CO • CD a a O cd C CA > . <D 13 ^ ca - E g £ 5 3 o « u £ c ° OO co O N ^ CO p a °° •a ^ c 2 <*> i p co fa T3 | l X CD OX) i M 3 ^ <^> 5 ON — ' cu ca S 6J0 ca 22 >H in co o c j 00 p € . O 0) <«M N cu CO XJ " 3 o W3 <u ro CL) H x i ts 1 ION II •iii g K • 1 J O N / I o s v • 1 OSV • oo U < TD U U •— c W O I P H C M fl N TD 'C - f l S CD X 3 "3 o CN CD l-i I I O N I O N / 1 OSV I o s v i n o u n I V>H ION/1 o s v I o s v t n o u n I I O N I O N / I o s v I o s v i n o u n I »0N • iON/l o s v # 1 o s v i n o u n o o o i n CD | s . to o o o o o o C M O « t L O T f C O C M C M T -CO fx CO o a> co Q ca X > -—' S c <a ca CA o B U E C3 S CU X J ' — 0 s o —-o X J o x: w w ca T3 O o a o ca O x o £1 & w 1 T3 c C CD SO T3 a E-oc Q Legend for Figure 27. At the top of the figure is an integrated map indicating the genomic placement of D8S2297, D8S1055, D8S339 and the GTF2E2/GSR/PPP2CB gene cluster. The D8S339 distal NotI and GTF2E2 proximal NotVAscI sites are also indicated. Restriction sites are N = NotI and A = AscI. The parenthesized NotI site is thought to be present in 844e2 but absent in 807fl 1. Just below the integrated map is the YAC contig. The YAC 896f4 is expanded into a detailed long range restriction map indicating the positions of D8S339, GTF2E2, D8S540/GSR and PPP2CB. Distances are measured in kilobases. The known directions of gene transcription for GTF2E2 and PPP2CB are indicated by the arrows. Although no direction of gene transcription is indicated for GSR, D8S540 (GSR2) is thought to be located at the 3' end of the gene. 99 B. Bioinformatic Analysis of the WRN Gene 1. Tool Building An ancillary outcome of my doctoral training was the conception, design, enhancement and/or novel implementation of a number of bioinformatics tools for genome sequence management and comparative analysis. a) ACEDB for Windows Central to this tool building work was my implementation of a version of ACEDB designed to run within the Microsoft Windows operating system environment. The current status (Fall 1998) of this software version is that a full production version of ACEDB for Windows (release 4.5.6) is available for anonymous FTP off the internet from the NCBI ( under repository/acedb/winace) or the Sanger Centre ( under pub/acedb/winace). ACEDB is powerful in its capabilities for analyzing and representing genomic information, often providing guidance to experimental planning (such as with restriction enzyme site identification in DNA sequences) or revealing interesting patterns in the data (see Figure 28 and Figure 29 for examples particular to this thesis). As such, my work with ACEDB became a valuable tool for my bioinformatics-driven comparative phylogenetic analysis of helicases. b) ACEDB Dendrogram Display In order to represent true taxonomic and phylogenetic trees for gene family analysis, a novel "dendrogram" display module was successfully conceived, designed and created, then employed to visualize ClustalW (Thompson, Higgins and Gibson, 1994) and Bete (Sjolander 1998a,b) tree data throughout this thesis. Refer to Appendix G. (HTML) Help Document for the New ACEDB Dendrogram Graphic Display for more details. 101 Figure 28. ACEDB "Dotter" Self Plot of the WRN Protein !P[1]:Dotter gi1 280208|WB VS. gi1280208|We • § ! ] c gil280208|We (horizontal) vs. gi1280208|We (vertical) About| 500 1000 o i i i i i i • i i E i i i i i i i i i 1 i i i i i i i t 200 -\ 400 -600 - / \ 800 -Direct repeat in (human) WRN detected by dotter \ . 1000 -1200 - \ ^ 1400 - \ The A C E D B "dotter" plot of (human) WRN againsts itself permits detection of an amino acid sequence starting at residue 424 ( H L S P N D N E N D T S Y U E S D E D L E M E M L K ) which is directly repeated at residue 451. This repeat is not observed in the WRN homolog from the phylogenetic nearest species, Mus musculus and Xenopus laevis, although a portion of the subsequence itself exhibits very high conservation (identity) in the three species. Its functional significance is unknown. 102 Figure 29. Peptide Display with Hydrophobic Plot of the WRN Repeat Subsequence [3]:gi1280208 • _ I — • • g - i l 2 8 0 2 0 B |Active ZoneT|P 1432] - j |y l ews... | |Who1e| |Zoom in| |Zoom out| [Analysi s. .71 420 430 440 450 -J 460 470 pamn—n This figure is a screen dump of an A C E D B "peptide" display showing the direct repeat subsequence of the human WRN gene product. Shown is the "hydrophobic" plot for this subsequence that highlights the highly negative hydrophilic value (almost -3.5) for the "NDNEND" epitope in the subsequence. 103 The current version of the dendrogram display is the result of an iterative design incorporating the useful feedback of pioneering users of the display at the Sanger Centre and is now being exploited in a variety of ACEDB projects outside this thesis. c) ClustalW Initial attempts to run a binary version precompiled for Intel personal computers was unsuccessful. The source code for ClustalW was obtained and rebuilt under Visual C/C++. Initial builds of the software crashed due to an invalid address pointer reference in one of the sequence array data structures. This problem was repaired by modification of a data array memory allocation and initialization. This patch essentially repaired the crash condition and permitted the productive use of the program in this thesis project. There were additional non-crash type bugs noted during program use: some branch distances come out negative; and iterative application of ClustalW to a given dataset often adds a leader and a trailer of "gap only" alignments. The cause of these latter two problems and their full impact upon the results presented in this thesis are not assessed here. The magnitude of the negative branch distances was not ascertained; however, these errors are reported by the ACEDB dendrogram display software and the absolute values of the distance are used for tree construction. d) Perl Scripts As the need arose, in particular, to convert character data from one format (e.g. "Genbank" records) into another (e.g. ACEDB .ace file formats), Perl based scripts were designed, implemented and run using Perl for Windows NT (Refer to Appendix E. Computer Program Listings (Perl scripts et al)). 104 2. Analysis ofthe WRN/DExH Gene Family a) Compilation of WRN Homologous Helicases A total of 99 WRN homologous DExD/H helicases (Table 18) identified by PSI-BLAST searches on the non-redundant Genbank protein database were ultimately compiled into a library of FASTA formatted sequences. For the initial multiple sequence alignments (MSA) undertaken using ClustalW, only a subsequence was used, defined by a 500 residue sequence window centered upon the so-called DExD/H helicase domain box defined by the regular expression "[CFILMV][AIMV][CFILMV]DE[ACGILS][DH]" (where square brackets denote a choice to be made from the list of (single letter) amino acid residues given). ClustalW parameters used in various runs are given in Table 19, Table 20 and Table 21 (Refer also to Appendix F. ClustalW (1.7) Multiple Sequence Alignment of WRN-related Helicases). Phylogenetic trees in the Phylip ("New Hampshire") format were generated within ClustalW and subjected to 10,000 bootstrapping trials with an initial random seed number of 871 (a number arbitrarily selected off my automobile's license plate number :-). The resulting bootstrap labeled phylogenetic tree was read directly into ACEDB for Windows for display upon a novel "dendrogram" tree display designed and implemented during this thesis (Figure 30). A Bete phylogenetic tree was generated using the similar multiple sequence alignments as input. The resulting tree dendrogram for the close RecQ subfamily homologs to WRN is shown in Figure 31. In addition, Bete analyses provided a table of statistics pertaining to amino acid residue conservation in the alignment (data not shown). In addition to identifying the seven canonical helicase motifs and a recently published IVa domain (Korolev et al, 1998), this analysis highlighted an additional highly conserved region centered upon a glutamine residue identity lying about 20 residues upstream of the N terminal end of helicase motif domain I (Figure 32). 105 Table 18. List o f WRN/DExx Family Genes Used in the Analysis Entrez accessioned proteins were ascertained as WRN homologs by a ^ - B L A S T search of the non-redundant Genbank database using the W R N gene product protein sequence as a seed. Generally, only non-duplicate sequences above the default expectation threshold were retained. The table is ordered by Entrez ID (not by any sequence similarity relationship). Entrez ID Description {Species} gi0003641 DBP1 {Saccharomyces cerevisiae} gi0005022 rad15 {Schizosaccharomyces pombe} gi0050823 elF-4AII {Mus musculus} gi0096321 A42357 {Escherichia coli} gi0113825 AN3 {Xenopus laevis} gi0116351 CHL1_YEAST {Saccharomyces cerevisiae} gi0118284 P68-like {Saccharomyces cerevisiae} gi0118411 DED1 {Saccharomyces cerevisiae} gi0119540 XPD_HUMAN {Homo sapiens} gi0124218 elF-4A {Saccharomyces cerevisiae} gi0129383 P68 {Homo sapiens} gi0130256 PL10 {Mus musculus} gi0130806 PRP5 {Saccharomyces cerevisiae} gi0131812 RAD3 {Saccharomyces cerevisiae} gi0132530 RHLP {Escherichia coli} gi0133134 P62 {Drosophila melanogaster} gi0266336 elF-4AI {Oryctolagus cuniculus} gi0281834 DinG {Escherichia coli} gi0281858 gi0281858 {Escherichia coli} gi0417180 elF-4AI {Mus musculus} gi0421132 mmrA {Escherichia coli} gi0446778 elF-4A {Drosophila melanogaster} gi0461924 DEAD {Klebsiella pneumoniae} gi0464912 Sgsl {Saccharomyces cerevisiae} gi0544162 DING_ECOLI {Escherichia coli} gi0729329 DHH1 {Saccharomyces cerevisiae} 106 Table 18. List of WRN/DExx Family Genes Used in the Analysis (cont'd) Entrez ID Description {Species} gi0729821 NUK-34 {Homo sapiens} gi0790392 M03C11.2 {Caenorhabditis elegans} gi0861396 CELF18C5.2 {Caenorhabditis elegans} gi1001719 gi1001719 {Synechocystis} gi1066920 CELE03A3.2 {Caenorhabditis elegans} gi1166504 HEL64 {Trypanosoma brucei} gi1169228 DB10 {Nicotiana sylvestris} gi1169261 gi1169261 {Haemophilus influenzae} gi1169345 YOAA_HAEIN {Haemophilus influenzae} gi1170507 elF-4A3 {Nicotiana plumbaginifolia} gi1173121 ROK1 {Saccharomyces cerevisiae} gi1174456 STE13 {Schizosaccharomyces pombe} gi1175484 ST13_SCHPO {Schizosaccharomyces pombe} gi1176565 CELK02F3.1 {Caenorhabditis elegans} gi1177010 gi1177010 {Bacillus subtilis} gi1280208 WRN {Homo sapiens} gi1352438 elF-4 {Schizosaccharomyces pombe} gi1363325 HEL117 {Rattus norvegicus} gi1418571 CELR05D11.4 {Caenorhabditis elegans} gi1592565 p72 {Homo sapiens} gi1666893 CHL1 {Homo sapiens} gi1705486 BLM {Homo sapiens} gi1706338 gi1706338 {Mycobacterium tuberculosis} gi1706437 DING_BACSU {Bacillus subtilis} gi1706438 DING_MYCTU {Mycobacterium tuberculosis} gi1708151 DBP3 {Saccharomyces cerevisiae} gi1708418 elF-4A {Schizosaccharomyces pombe} gi1709532 RCK {Mus musculus} gi1709533 P54 {Xenopus laevis} 107 Table 18. List of WRN/DExx Family Genes Used in the Analysis (cont'd) Entrez ID Description {Species} gi1710074 RecQ (Homolog) {Bacillus subtilis} gi1730960 YPVA_BACSU {Bacillus subtilis} gi1752904 F25H2.13 {Caenorhabditis elegans} gi1931649 RecQ (Homolog) {Arabidopsis thaliana} gi2058510 RepD {Dictyostelium discoideum} gi2072674 MTCY7D11 {Mycobacterium tuberculosis} gi2128837 MJ1401 {Methanococcus jannaschii} gi2130973 WRN (homolog) {Mus musculus} gi2131417 YDR291w {Saccharomyces cerevisiae} gi2134009 ERCC2/XPD {Xiphophorus maculatus} gi2136088 RCK {Homo sapiens} gi2150025 elF {Cryptosporidium parvum} gi2276199 CELT04A11.6 {Caenorhabditis elegans} gi2408082 gi2408082 {Schizosaccharomyces pombe} gi2443810 elF-4AIII {Xenopus laevis} gi2495145 XPD_CRIGR {Cricetulus griseus} gi2495146 XPD_MOUSE {Mus musculus} gi2500113 RecQ (Homolog) {Synechocystis} gi2500523 elF-4A {Candida albicans} gi2500527 P68 {Mus musculus} gi2500528 DEAD2 {Mus musculus} gi2500540 MJ0669 {Methanococcus jannaschii} gi2558533 DEAD {Danio rerio} gi2580554 DEAD box, Y isoform {Homo sapiens} gi2619051 RecQ (Homolog) {Bacillus subtilis} 108 Table 18. List of WRN/DExx Family Genes Used in the Analysis (cont'd) Entrez ID Description {Species} gi2621738 AE000845 {Methanobacterium thermoautotrophicum} gi2622454 gi2622454 {Methanobacterium thermoautotrophicum} gi2642224 RecQ (Homolog) {Ustilago maydis} gi2648271 DEAD {Archaeoglobus fulgidus} gi2773184 DEAD {Caenorhabditis elegans} gi2851488 RecQ {Escherichia coli} gi2983030 DinG {Aquifex aeolicus} gi2995310 gi2995310 {Streptomyces coelicolor} gi3023628 DEAD box, X isoform {Homo sapiens} gi3036880 SC5B8 {Streptomyces coelicolor} gi3047117 gi3047117 {Arabidopsis thaliana} gi3183240 Y942_METJA {Methanococcus jannaschii} gi3183486 YOAA_ECOLI {Escherichia coli} gi3217395 CELF33H2.1 {Caenorhabditis elegans} gi3257105 RAD3 {Pyrococcus horikoshii} gi3420290 FFA-1 {Xenopus laevis} 109 Table 19. ClustalW (Default) Pairwise Alignment Parameters Slow/Accurate alignments: 1. Gap Open Penalty 10.00 2. Gap Extension Penalty 0.10 3. Protein weight matrix BLOSUM30 4. DNA weight matrix IUB Fast/Approximate alignments: 5. Gap penalty 3 6. K-tuple (word) size 1 7. No. of top diagonals 5 8. Window size 5 Table 20. ClustalW (Default) Multiple Alignment Paramete 1. Gap Opening Penalty 10.00 2. Gap Extension Penalty 0.05 3. Delay divergent sequences 40% 4. DNA Transitions Weight 0.50 5. Protein weight matrix BLOSUM series 6. DNA weight matrix IUB 7. Use negative matrix OFF Table 21. ClustalW (Default) Protein Gap Parameters 1. Toggle Residue-Specific Penalties :ON 2. Toggle Hydrophilic Penalties :ON 3. Hydrophilic Residues :GPSNDQEKR 4. Gap Separation Distance :8 5. Toggle End Gap Separation :OFF 110 3 o d S o o co 1— " " O CD "o H OJJ O co .S cu ?5 « £ u CO CD to CO O O ° S cn _J> ca T3 TO to ^ J -t—• d „; CD CD cn CD CD H M CO <« CD a "a cj ft 3 co I eg J—I o ^ -S3 ° B „ . CD SO •73 C 0 0 ^ "8 8 o o |1 CD S £ S -3 | | x> * )3 ""3 Co CD ce c —5" # « O o .22 'co w CD = JS a <H .3 O co a 8 .2 « a 3 ^ CD d •3 -73 •73 CO • — •- JS ft ° ca 43 CQ "S w 3 jb P-o a3 g CO & cn co B 3 ca 0 ?3 1 | > cS CD 53 •B '-3 CO CO «> CD 3 CJ *^  d is CD CD 5- > CT CD CD > M 5 ca cS 2 co O CD ca O CD ft a I -&• d CO CD i I O cn Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences The full length protein sequences of all members of the RecQ gene family subtree from the ClustalW phylogenetic analysis (Figure 30(d)) were exported from A C E D B as a F A S T A protein sequence library, then a full independent M S A reconstructed using ClustalW. Sequences are identified by Entrez ID number (gi 1280208 is the human WRN protein itself; see Table 18); the classical core helicase motifs are coarsely annotated (e.g. as <- Domain I ->) gil280208 -MSEKKLETTAQQRKCPEWMNVQNKRCAVEER-KACVRKSVFEDDLP 45 gi2130973 METTSLQRKFPEWMSMQSQRCATEE- -KACVQKSVLEDNLP 39 gi3420290 MTSLQRKLPEWMSVKQQEDRIDDAKKSFCKKNILEDNLP 39 gi2851488 • gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi04 64 912 MVTKPSHNLRREHKWLKETATLQEDKDFVFQAIQKHIANKRPKTNSPPTTPSKDECGPGT 60 g i l l75484 MTVTKTNLNRHLDWFFRES PQKIENVTS PIKTLDFVKVKVS SS D 44 gi2500113 gi0861396 - - . gil705486 MAAVPQNNLQEQLERHSARTLNNKLSLSKPKFSGFTFKKKTSS DNNVSVTN 51 gi2276199 gil06G920 . gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 --FLEFTGSIVYSYDASDCSFLSEDISMSLSDGDWGFDMEWPPLYNRGKLGKVALIQLC 103 gi2130973 --FLEFPGSIVYSYEASDCSFLSEDISMRLSDGDWGFDMEWPPIYKPGKRSRVAVIQLC 97 gi3420290 --FMKFNGSIVYSYESNDCSLLSEDIRSSLLEEDVLGFDIEWPPVYTKGKTGKVALIQVC 97 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 MDSAESELLEEEELPEIKYIDSAEFQNFDNNADQ--REIQFRWMEAF 45 gi0464912 TNFITSIPASGPTNTATKQHEVMQTLSNDTEWLSYTATSNQYADVPMVDIPASTSWSNP 12 0 g i l l 7 5 4 84 --IWKDSIPHKSKNVFDDFDDGYAIDLTEEHQSSSLNNLKWKDVEGPNILKPIKKIAVP 102 gi2500113 gi0861396 gil705486 VSVAKTPVLRNKDVNVTEDFSFSEPLPNTTNQQRVKDFFKNAPAGQETQRGGSKSLLPDF 111 gi2276199 MFTLPPVLNSSYVGIHGNSTFINFEFSFYTLPMLFLLVPILYIPITIIIILRILVKLYYA 60 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 • gi2128837 117 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 VSES KCYLFHVSSMSVFPQGLKMLLENK AVKKAGVGIEGDQWKLLRDFD 152 gi2130973 VSEN KCYLFHISSMSVFPQGLKMLLENK SIKKAGVGIEGDQWKLLRDFD 146 gi3420290 VSEK KCYLFHISPMAGFPKGLKRLLEDE SVRKVGVGIEGDQWKLMSDYE 146 gi2851488 gi2619051 gil710074 . gil931649 gil082337 g i l l 7 6 5 6 5 ETKE PRWFRLLDEFITPLSGKRGYCSRI VEKYARVLLHQGFELSQTDGP 94 gi04 64 912 RTPNGSKTHNFNTFRPHMASSLVENDSSRNLGSRNNNKSVIDNSSIGKQLENDIKLEVIR 180 g i l l75484 ASES EEDFDDVDEEMLRAAEMEVFQSCQPLAVNTADTTVSHSTSSSNVPRSLN 155 gi2500113 gi0861396 gil705486 LQTPKEWCTTQNTPTVKKSRDTALKKLEFSSSPDSLSTINDWDDMDDFDTSETSKSFVT 171 gi2276199 FRDR NNNVYLLSAISISQCMCLLFFLADFLYLRLPTSGLLTSWCASIEPNRFI 113 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 IKLKNFVELTDV-ANKKLKCTETWSLNSLVKHLLGKQLLKDKSIRCSNWSKFPLTEDQKL 211 gi2130973 VKLESFVELTDV-ANEKLKCAETWSLNGLVKHVLGKQLLKDKSIRCSNWSNFPLTEDQKL 2 05 gi3420290 LKLKGFIELSEM-ANQKLRCKEKWTFNGLIKHLFKEQLYKRKSYRCSNWDIFLLTEDQKL 205 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 LRVAAIAASRDV-GQAGNLPSVLNNLSEVTRHLNLAQYAS TS LATVGQRKFANEWQKTV 153 gi0464 912 LQGSLIMALKEQ-SKLLLQKCSIIESTSLSEDAKRLQLSRDIRPQLSNMSIRIDSLEKEI 239 g i l l 7 5 4 84 KIHDPSRFIKDNDVENRIHVSSASKVASISNTSKPNPIVSENPISATSVSIEIPIKPKEL 215 gi2500113 gi0861396 gil7054 86 PPQSHFVRVSTAQKSKKGKRNFFKAQLYTTNTVKTDLPPPSSESEQIDLTEEQKDDSEWL 231 gi2276199 TILTIFTYHINYSTMIFPFLVSIMRLILIISPKNHKKFNGQLLRFSIPFICVYPIIFTFF 173 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 118 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 YAA TDAYAGFII 223 gi2130973 YAA TDAYAGLII 217 gi3420290 YAA TDAYAGLLI 217 gi2851488 gi2619051 : gil710074 gil931649 gil082337 g i l l7G565 NQE RLGCFHLFL 165 gi0464 912 IKAKKDGMSKDQSKGRSQVSSQDDNIISSILPSPLEYNTSSRNSNLTSTTATTVTKALAI 2 99 g i l l75484 SNN LPF 221 gi2500113 gi0861396 gil705486 SSDVICIDDGPIAEVHINEDAQESDSLKTHLEDE RDNSEKKKNLEEAELHS 282 gi2276199 MFP AIGYCSYA 184 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 YRNLEILDDT VQRFAINKEEEILLSDMNKQLTSISEEVMDLAKHLPHAFSKLENPR 279 gi2130973 YQKLGNLGDT VQVFALNKAEENLPLEMKKQLNLISEEMRDLANRFPVTCRNLETLQ 273 gi3420290 YKKLEGMDAHE SDSFRVGREGVADCKGVKRQLTDLSKGLMDLVNQVPNSFGCYTEAV 274 gi2851488 gi2619051 gil710074 gil9316.49 gil082337 g i l l 7 6 5 6 5 IGYREGLENLL---SNHRHLRNFRYSAKKRHDLLTEHIKNIFQERDSFIQAPEASSFALQ 222 gi0464912 TGAKQNITNNTGKNSNNDSNNDDLIQVLDDEDDIDCDPPVILKEGAPHSPAFPHLHMTSE 3 59 g i l l 7 5 4 84 PRLNNNNTNN NNDNNAIEKRDSASPTPSSVSSQISIDFSTWPHQNLLQYLDILRDE 277 gi2500113 gi0861396 gil7054 86 TEKVPCIEFDDDDYDTDFVPPSPEEIISASSSSSKCLSTLKDLDTSDRKEDVLSTSKDLL 342 gi22 76199 AYPFPFGAIIFRIERTFFGLVNNFSLLFNTLFWMTCCIITNFILLLLLIKSRCLLNAQTR 244 gilOS6920 gi3036880 gi2642224 • gi2131417 MEEGPIKKKLKSAGQGSGKTDAFRNFEQFFFRLNTLYT 38 gi2621738 gi2128837 119 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 gi2130973 gi3420290 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi0464912 g i l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 gi2130973 gi3420290 gi2851488. gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi04G4912 g i l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 RVSILLKDISENL YSLRRMIIGSTNIETELRPSNNLNLLS 319 RVPVILKSISENL CSLRKVICGPTNTETRLKPGSSFNLLS 313 RAVDILEDLSEKL EELRNIMKEASKAE GNGLHFQN 3 09 TVLENWTFPLQNE QIVQLAHANTLQMRALHVKGDGFYFLN 2 62 EQDELTRRRNMRSREPVNYRIPDRDDPFDYVMGKSLRDDYPDVEREEDELTMEAEDDAHS 419 K SEISDRIIEVMERYP FSSRFKEWIPKRDILSQKISSVLEVLS 320 MISDDDDLPSTRPGSVNEELPETE 24 SKPEKMSMQELNPETSTDCDARQISLQQQLIHVMEHICKLIDTIPDDKLKLLDCGNELLQ 4 02 SMHSYKVEVSLSLT TFSMIFSYLSNAMIVICSFFFWNYT 2 83 FLICRKHWPTFKT LCGPIETALKRTVTKEDLAMVMALM 77 MRDIMIVLNRRKR 13 MLIVRKPKK 9 FEDS TTGGVQQKQIREHEVLIHVEDETWDPTLDHLAKHDGEDVLGNKVERKEDG 373 SEDSAAAGEKEKQIGKHSTFAKIKEEPWDPELDSLVKQEEVDVFRNQVKQEKGE 3 67 SED CSKKDK SILHVACK ESLAEHK-MDCKNADSQNNKD 346 EALNVAIAFNWSSSQEEMEEFYSIFTPQATNPRGKRIFQFWKMISKYDRSNGNA 316 SYMTTRDEEKEENELLNQSDFDFWNDDLDPTQDTDYHDNMDVSANIQESSQEGDTRSTI 479 NNNNSNNNNGNNG TVPNAKTFFTPPSSITQQVPFPSTIIPESTVKENSTRP 3 71 PEDNDELPETEPESDSDKPTVTSNKTENQVADEDYDSFDDFVPSQTHTASKIPVK 79 QRNIRRKLLTEVDFNKSDASLLGSLWRYRPDSLDGPMEGDSCPTGNSMKELNFSHLPSN- 461 SLAIMLRPFGNDLDTCVAPWVFYLTHPVFRKKACVTTHSNYCFCFFFRFSLNFE 337 .--MTEIQFSEMSSGKRSLETITIDDSDEETDKEP 32 PRECVFKYIDENQIYTETKIFDFNNGGFQQKENDIFELKDVDDQNQTQKSTQLL 131 SVDFIPAGNPKKILNTRRKPAYWGRLKIRSTEAGPRISRFTVEKGERETLRKPS 67 KKDEIEIVKVGGKIEDGIEVKNNQKIFANYKKVGDKYKLYRCRVGDKLIQPSK 62 120 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gi1280208 FEDGVEDNKLKENMERACLMSLDITEHELQILEQQSQEEYLSDIAYKSTEHLSP 427 gi2130973 SENEIEDNLLREDMERTCVIP-SISENELQDLEQQAKEEKYNDVSHQLSEHLSP 42 0 gi3420290 IDSCQNENRDEDFFMTLGISEEELYMMERE-DDKKQTNPDYKLNKDSCD 394 gi2851488 gi2619051 gil710074 gil931649 MKDVLLAISNELLDDATDLSPDRVGQ 26 gil082337 g i l l 7 6 5 6 5 ' QGFLVTAQRLEEESKSDDIGAGCVTAVELACKIRNQSKTIMDTITRHPSLGPRM 3 70 gi0464 912 TLSQNKNVQVILSSPTAQSVPSNGQNQIGVEHIDLLEDDLEKDAILDDSMSFSFGRQHMP 53 9 g i l l 7 5 4 84 YVNSHLVANDKITATPFHSEAWSPLQSNIRNSDIAEFDEFDIDDADFTFNTTDPI 427 gi2500113 gi08613 96 NKRAKKCTVESDSSSSDDSDQGDDCEFIPACDETQEVPKIKRGYTLRTRASV 131 gil705486 SVSPGDCLLTTTLGKTGFSATRKNLFERPLFNTHLQKSFVSSNWAETPRLGKKNESSYFP 521 gi2276199 IFCVSKGHYSFNEYQQFPSRPQKRLVDPPIVDLDEEPPIVDLDDSFDNFHVG 3 89 gil066920 AAKKPSNYTAWAEIFNKKKTESSAVSAIEKAKKDSEKERQRQKLREAIME 82 gi3036880 gi2642224 . gi2131417 IFEFIDG-TMQRSWSASDRFSQIKIPTYTTEEMKKMISKREALFKSRLREFILEKEK 187 gi2 62173 8 EALKILKKQAVILTGRDPEIEDLLSSYGISYRYARVCQHCLHEGYLTWSSRSS 121 gi212 883 7 VLELLKSDKIFILKENEEEIEEVLKSYNLKFDYIELCPFCLLKNIYKRLTRNNR 116 g i l 2 802 0 8 NDNENDTSYVIESDEDLEMEMLKHLSPNDNENDTSYVIESDEDLEMEM LKS 4 78 gi2130973 ND DE NDSSYIIESDEDLEMEM LKS 444 gi3420290 TN EE KDMSYVIESDEDFDSEI IKS 418 gi2851488 gi2619051 gil710074 gil931649 LR QE R LRLKKQIQQLENHIRDKE SQK 52 gil082337 SASVSALTEELDSITSE LHA 20 g i l l 7 6 5 6 5 MQLAERCGVGYENEE 11 HQ ICDRFSLHPNLMSDVLLS KLSTE LAD 415 gi0464912 MSHSDLELIDSEKEN EDFEEDNNNNGIEYLSDSDLERFDEERENRTQVADIQE 592 g i l l75484 ND ES GASSDVWIDDEEDDIE NRP 451 gi2500113 g i 0861396 KN KCDDSWDDGIDEEDVS KR SED 154 gil7054 86 GNVLTSTAVKDQNKHTASINDLERETQPSYDIDNFDIDDFDDDDDWEDIMHNLAASKSST 581 gi2276199 ST SEEWSGDIAPEEEEEE- GHDS 412 gil066920 RK RREALMNE PKKQKIEPVS -TVK 105 gi3036880 gi2642224 gi2131417 AN LDPFSELTNLAQKYIPR ERD 209 gi2621738 TV H AGQIICSRCVDELIKRELK FAG 146 gi2128837 CR YGNLEICINCGINEI KE E V KI 139 121 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 LENLNSGTVEPTHS KCLKMERNLGLPTKEEEEDDENEANEG--EEDD 523 gi2130973 LENLNSDMVEPTHS --KWLEMGTNGCLPP-EEEDGHGNEAIK EEQE 487 gi3420290 LEDLDNSTEEALGT GVPQAGLIPAKSVDTVADEEEDEGIE EEDD 4G2 gi2851488 MA 2 gi2619051 gil710074 gil931G49 SQFLSSTATRIFQY ETPKSTNYKMDQPQTDFRAHVSDQGRY--ACDS 97 gil082337 VEIQIQELTERQQE LIQKKKVLTKKIKQCLEDSDAGASNEY- -DSSP 65 g i l l 7 6 5 6 5 LDGEIGQIDQQISQ LRRKKSELTQKRQAIERKIELKTNEDS--DWT 460 gi0464 912 LDNDLKIITERKLTGDNEHPPPSWSPKIKREKSSVSQKDEEDDFDDDFSLSDIVSKSNLS 652 g i l l75484 LNQALKASKAAVSN -ASLLQSSSLDRPLLGEMKDKNHKVLMP SLD 495 gi2500113 gi0861396 TLNDSFVDPEFMDS V--LDNQLTIKGKKQFLDDGEFFTDRNVPQIDE 199 gil7054 86 AAYQPIKEGRPIKSVSERLSSAKTDCLPVSSTAQNINFSESIQNYTDKSAQNLASRNLKH 641 gi2276199 FDDFESVPAQPPSK NTLASLQKSDSEIALNQQRHDMHGRFRGFLQDDSEE 462 gil066920 LEKSVNQKISANSE DFDVSGPSNSRETQESPLDPNDFVANP LAI 149 gi3036880 gi2642224 NSIK 4 gi2131417 YEDPIEAMMKAKQESNE MSIPNYSNNSVITTIPQMIEKLKSTEFYASQIKHCF 262 gi2621738 MDGSTFRNFRRLIR RGVSLDKILEMMSPRFDPLENHELTRYDTVTSES 194 gi2128837 SEEFIEKFLKRFKD1 V DKVLSLLRIRNPLDKPELTRYDIITGSEE 183 gil280208 DK DFLWPAPNEEQVTCLKMYFGHSS FKP-VQWKVIHSVLEERR gi213 0973 EE DHLLPEPNAKQINCLKTYFGHSSFKP-VQWKVIHSVLEERR gi342 02 90 DDD-WDPSMPEPSAQHISCLKTYFGHSSFKP-VQWKWHSVLRERR gi2 8514 88 QA EVLNLESGAKQV--LQETFGYQQFRP-GQEEIIDTVLSGR-gi2619051 MLHRAQSL- -LAHYFGYEKFRS-GQDEAIRLVTEARQ gil710074 MTKLQQT- -LYQFFGFTSFKK-GQQDIIESILSGK-gil931649 WNTPRDSS FSVDRYVQVN-NKKVFGNHSFRP-NQREIINATMSGS-gil082337 AAWNKEDFPWSGKVKDI--LQNVFKLEKFRP-LQLETINVTMAGK-g i l l 7 6 5 6 5 DRWDRDGFPWSDEATKI--LKEQFHLEKFRP-LQRAAINAVMSKE-gi0464912 SKTNGPTYPWSDEVLYR--LHEVFKLPGFRP-NQLEAVNATLQGK-gi l l75484 DP--MLSYPWSKEVLGC--LKHKFHLKGFRK-NQLEAINGTLSGK-gi2500113 MADRQSLEEALRRIWGYDHFRY-PQGEVIDCLLARR-gi08613 96 ATKMKWASMTSPPQEALNALNEFFGHKGFRE-KQWDWRNVLGGK-gil705486 ERFQSLS FPHTKEMMKI--FHKKFGLHNFRT-NQLEAINAALLGE-gi2276199 FSDEVGLLGADMNKELYDTLKS KFGFNQFRH-RQKQCILSTLMGH-gil066920 GTGERLIRGQDIIERRDKVFLELFCHKKYRSRLQMQAINCILKRKC gi3 03 6 8 8 0 --MDHVELRTEADAVLAELVGDREGSARLRE-DQWQAVAALVEEHR gi2642224 SVQHRERPKVTAEDLGAVARKLYCGDLQFRP-GQRRAMLAIMGRRQAEQVVWMPTGAGK gi2131417 TIPSRTAKYKGLCFELAPEVYQGMEHENFYS-HQADAINSLHQGE---NVIITTSTSSGK gi262173 8 ERTPRVPLDKLAVPEKFKRMLKREGNTVLRP-VQVLAVDAGLLEGE--DLMWSATASGK gi212 883 7 DKIENYKIDELDIPEELKEIIKSRGIEELLP-VQTLSVKAGLLNGD--DLLIISATSSGK DNVAVMATGYGK 577 DNWVMATGYGK 541 DNLWMATGYGK 518 DCLWMPTGGGK 53 NTACIMPTGGGK 4 6 DTIAMLPTGGGK 44 DVFVLMPTGGGK 152 EVFLVMPTGGGK 119 DAWILSTGGGK 514 DVFVLMPTGGGK 706 DVFILMPTGGGK 54 7 DCLWLPTGGGK 47 DQFVLMSTGYGK 255 DCFILMPTGGGK 695 DTFVLMPTGAGK 518 DVYVSLPTGAGK 207 RALWQRTGWGK 55 63 318 251 240 <ir Domain I -i> 122 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) g i l 2 802 0 8 SLCFQYPPVYVGK IGLVISPLISLMEDQVLQLKMSN--IPACFLGSAQSEN 626 gi213 0973 SLCFQYPPVYTGK IGIVISPLIS LMEDQVLQLELSN--VPACLLGSAQS KN 590 gi342 02 90 SLCYQFAPVYTSG IGIVICPLISLMEDQVLQLEMSN--ISSCFLGSAQSKN 567 gi2851488 S LCYQIPALLLNG LTVWS PLIS LMKDQVDQLQANG--VAAACLNSTQTREQQLE 106 gi2619051 SICYQIPALMFEG TTIVISPLISLMKDQVDALEEAG--INAAYINSTQSNQEIYE 99 gil710074 SLCYQLPGYMLDG MVLIVSPLLSLMEDQVQQLKARGE-KRAAALNSMLNRQE--R 96 gil931649 SLTYQLPALICGG 1TLVISPLVSLIQDQIMNLLQAN--1PAASLSAGMEWAEQLK 205 gil08233 7 SLCYQLPALCSDG FTLVICPLISLMEDQLMVLKQLG--ISATMLNASSSKEHVKW 172 g i l l 7 6 5 6 5 SLCYQLPALLANG LALWSPLISLVEDQILQLRSLG--IDSSSLNANTSKEEAKR 567 gi0464912 SLCYQLPAWKSGKTH-GTTIVISPLISLMQDQVEHLLNKN--IKASMFSSRGTAEQRRQ 763 g i l l 7 5 4 84 SLCYQLPAVIEGGASR-GVTLVISPLLSLMQDQLDHLRKLN--IPSLPLSGEQPADERRQ 604 gi2500113 SICFQLPALLGEG LTLWSPLVALMEDQVQSLRRQN--LPAACLHSQLSRPERKQ 100 gi0861396 SVCYQLPSLLLNS MTVWSPLISLMNDQVTTLVSKG--IDAVKLDGHSTQIEWDQ 308 gil7054 86 SLCYQLPACVSPG VTWISPLRSLIVDQVQKLTSLD--IPATYLTGDKTDSEATN 74 8 gi2276199 SLCYQLPAVILPG VTVWSPLRSLIEDQKMKMKELG--IGCEALTADLGAPAQEK 571 gil06692 0 SLCYQLPAWHGG ITWISPLIALMKDQISSLKRKG--IPCETLNSTLTTVERSR 2 60 gi3036880 SAVYFVATALLRRRGA-GPTVIISPLLALMRNQVEAAARAG--IRARTINSANPED-WEA 111 gi2642224 SLLFMVGACLEGA ETTILILPTVALRANMLAKLDVMN IRYHVWQPGSKKAA 114 gi2131417 SLIYQLAAIDLLLKjOPESTFMYIFPTKALAQDQKRAFKVILSKIPELKNAVVDTYDGDTE 378 gi262173 8 TLIAELAGIPRALGGE--KFIYLTPLVALANQKYRDFRRRY--SPLKLKTAIKVGMSRIR 3 07 gi212 8 83 7 TLIGELAGIKNLIKTG-KKFLFLVPLVALANQKYLEFKERY--EKLGFKVSLRVGLGR-- 295 : : : * : * : <- Domain la < g i l 2 8 0208 -VLTDIKLGK--YRIVYVTPEYCSGNMGLLQQLEADIG 1TLIAVDEAHCIS-EWG- 677 gi213 0973 -ILGDVKLGK--YRVIYITPEFCSGNLDLLQKLDSSIG 1TLIAVDEAHCIS-EWG- 641 gi342 0290 -VLQDVKDGK--MRVIYMTPEFCSRGISLLQDLDNRYG ITLIAIDEAHCIS-EWG- 618 gi2851488 -VMTGCRTGQ--IRLLYIAPERLM--LDNFLEHLAHWN PVLLAVDEAHCIS-QWG- 155 gi2619051 -RLNGLKEGA--YKLFYITPERLT--SIEFIRILQGID VPLVAIDEAHCIS-QWG- 14 8 gil710074 -QFVLEHIHR--YKFLYLSPEALQ--SPYVLEKLKSVP ISLFVIDEAHCIS-EWG- 145 gil931649 -IFQELNSEHSKYKLLYVTPEKVAK-SDSLLRHLENLNSRGLLARFVIDEAHCVS-QWG- 261 gil082337 -VHAEMVNKNSELKLIYVTPEKIAK-SKMFMSRLEKAYEARRFTRIAVDEVHCCS-QWG- 22 8 g i l l 7 6 5 6 5 -VEDAITNKDSKFRLLYVTPEKLAK-SKKMMNKLEKSLSVGFLKLIAIDEVHCCS-QWG- 623 gi0464912 -TFNLFINGL--LDLVYISPEMISA-SEQCKRAISRLYADGKLARIWDEAHCVS-NWG- 817 g i l l 7 5 4 84 -VISFLMAKNVLVKLLYVTPEGLAS-NGAITRVLKSLYERKLLARIVIDEAHCVS-HWG- 66 0 g i 2 5 0 0113 -VLYQLGQQQ--LKLLYLS PETLL--SE PVWNLLRQPQ--VKLQGIMLDEAHCLV-QWG- 151 gi0861396 -VANNMHRIR FIYMSPEMVT--SQKGLELLTSCR--KHISLLAIDEAHCVS-QWG- 3 57 gil7054 86 -IYLQLSKKDPIIKLLYVTPEKICA-SNRLISTLENLYERKLLARFVIDEAHCVS-QWG- 8 04 gi2276199 -IYAELGSGNPSIKLLYVTPEKISA-SGRLNSVFFDLHRRGLLARFVIDEAHCVS-QWG- 627 gil066920 -IMGELAKEKPTIRMLYLTAEGVA--TDGTKKLLNGLANRDVLRYIWDEAHCVT-QWG- 315 gi3036880 -IYGEVERGE--TDVLLVS PERLNS-VDFRDQVLPRLA--ATTGLLWDEAHCIS-DWG- 163 gi2642224 -PIVLVSTEA AITLAFKEYAN RLLQQQRL DRIVIDECHLTL-TAR- 157 gi2131417 PEERAYIRKN--ARVIFTNPDMIH--TSILPNHANWRHFLYHLKLWVDELHIYKGLFGS 434 gi2621738 - ARDELRIPETDVS KADIWGTYEG - MDYILRAGRSGILG - DVGVWI DE IHTLE - DEE - 362 gi212 8837 -IGKKVDVET--SLDADIIVGTYEG-IDYLIRTKRLKD IGTWIDEIHSLN-LEE- 34 5 : * * * <- Domain II -> 123 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 HDFRDSFRKLGSLK gi2130973 HDFRSSFRMLGSLK gi3420290 HDFRSAYRSLGSLK gi2851488 HDFRPEYAALGQLR gi2619051 HDFRPSYRNIEILF gil710074 HDFRPDYSKLGQLR gil931649 HDFRPDYQSLGILK gil082337 HDFRPDYKALGILK g i l l 7 6 5 6 5 HDFRTDYS FLNVLK gi0464912 HDFRPDYKELKFFK g i l l75484 HDFRPDYKQLGLLR gi2500113 DSFRPAYRRLGALR gi0861396 HDFRNSYRHLAEIRNRSDLCNIP-gil705486 HDFRQDYKRMNMLR--QKFPSVP-gi2276199 HDFRPDYTKLSSLREKYANPPVP-gil066920 HDFRPDYLTLGSLR--DVCPGVP-gi3036880 HDFRPDYRRLRTML--AELPAGVP gi2642224 SYRRSMMQLAWHVR DVETQT-gi2131417 HVALVMRRLLRLCHCFYENSGLQ-gi2621738 RGS RLKGMIKRIRR LFPDAQ-gi2128837 RGARLDGLIGRLRF LFKEAQ-TALPMVP-TALPLVP-RMLPNVP-QRFPTLP-RELHDKPV KKLGHPP-QKFPNIP-RQFPNAS-RQFKGVP-REYPDIP-DRYQGIP-IVALTATASSSIR VIALSATASSSIR IVALTATASPSIR FMALTATADDTTR IMALTATATPEVH VLALTATATKETL VLALTATATASVK LIGLTATATNHVL ILGLTATATSNVL MIALTATASEQVR FMALTATANEIVK RGLGRDKGQIPLAAFTATADRQQQ -MIALTATATVRVR -VMALTATANPRVQ -IIALTATATPKIV -WVALTATANAKAQ -VLATTATANARVT -VWLTATLP-PIFE -EDIVRCLNLRN--EDIISCLNLKD--EDITKSLNLHN-- QDIVRLLGLND--DDICKQLHIQK-- QDVMNLLELQH-- EDWQALGLVN --TDAQKILCIEK-- DDVKDMLGIQA--MDIIHNLELKE--KDIINTLRMEN-- NLIVEGLNLRS--DDVIANLRLRK--KDILTQLKILR-- TDARDHLKMQN--PQITC 727 -PQITC 691 -PQVTC 668 -PLIQI 205 -ENTVY 199 -AVRHL 195 -CWFR 311 -CFTFT 2 78 -ALTFR 6 73 -PVFLK 867 -CLELK 710 -PECFQ 2 05 -PLITT 409 -PQVFS 854 -SKLFI 679 -PESFK 365 DDIAFQLKLRN-ADVAEQLGTGAGDALVLR 216 DAFISHNKLTK--PLIVR 2 05 - -FISCSATLKS PVQHMKDMFGINEVTLIHEDGS P 4 90 - -11ALSATVKNNLE VAS-EFGLRLVEYDRRP 411 - -KIYLSATIGNPKE LAKQLNAKLVLYNGRP 394 <- Domain III -> gil280208 TGFDRPNLYLEVRRKTGN ILQDLQPFLV--KTSSHW EFEGPTIIYCPSRK 775 gi2130973 TGFDRPNLYLEVGRKTGN---ILQDLKPFLVR-KASSAW EFEGPTIIYCPSRK 740 gi34202 90 TSFDRPNLYLDVARKTTN ISIDLRQFLIKKQQGSGW EFEGATIVYCPTRK 718 gi2851488 SSFDRPNIRYMLMEKFKP LDQLMRYVQE QRGKSGIIYCNSRA 247 gi2619051 TGFSRENLTFKWKGENK DRFIDEYVQN NRHEAGIVYTATRK 241 gil710074 NSVNRPNIALRVENAADT AEKIDRVIQL - - V E NLQGPGIVYCPTRK 23 9 gil931649 QS FNRPNLWYS WPKTKKC - - LED IDKFI KEN HFDECGIIYCLSRM 355 gil082337 ASFNRPNLYYEVRQKPSNT--EDFIEDIVKLIN GR YKGQSGIIYCFSQK 325 g i l l 7 6 5 6 5 AGFNRSNLKYKWQKPGSE--DECTEEIAKTIKR D FAGQTGIIYCLSRN 720 gi0464912 QSFNRTNLYYEVNKKTKNT--IFEICDAVKS R FKNQTGIIYCHSKK 911 g i l l75484 SSFNRPNLFYEIKPK-KDL--YTELYRFISN G -HLHESGIIYCLSRT 753 gi2500113 VSPHRPQLHLKVKMVLSEYCRRQQLRRFLLK HLQESGLIYVRTRT 250 gi0861396 TSFDRKNLYISVHSSKDMAEDLGLFMKTDEVKG-----R HFGGPT11YCQTKQ 457 gil705486 MS FNRHNLKYYVLPKKPKKVAFDCLEWI RKH HPYDSGIIYCLSRR 899 gi2276199 SSFVRDNLKYDLIPKAARS-LINWEKMKQL YPGKSGIVYCLSRK 723 gil066920 SGTYRDNLFYDNHMASFITKCLTVDAKTSSSNLTKHEKAERSQNKKTFTGSAIVYCRSRN 425 gi3036880 GPLDRESLRLGVLVLPDA AHRLAWLGERLG ELPGSGIIYTLTVA 260 gi2642224 ESTNRSNLRYSVRTAEHRMSGMTCYDAVRWDESRARTDIWN GQRDRIIVYCTSKE 261 gi2131417 TGAKHLWWNPPILPQHERKRENFIRESAKILVQ LILNNVRTIAFCYVRR 540 gi2621738 VPLERHLVFSRGEEDKKNLILRLAEREFSTESEK GFRGQTIVFTNSRR 459 gi2128837 VPLERH11FCKNDFAKLNII KE IVKREWQNIS K FGYRGQCLIFTYSRK 442 <- Domain IV 124 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) g i l 2 802 0 8 MTQQVTGELR KLN LSCGTYHAGMSFSTRKDIHHRFV-RDEIQCVIATIAFG 825 gi2130973 MTEQVTAELG KLN LACRTYHAGMKISERKDVHHRFL-RDEIQCWATVAFG 790 gi3420290 TSEQVTAELI KLG IACGTYHAGMGIKQRREVHHRFM-RDEIHCWATVAFG 768 gi2851488 KVEDTAARLQ SKG 1SAAAYHAGLENNVRADVQEKFQ-RDDLQIWATVAFG 297 gi2619051 EADRIYERLK RNQ VRAGRYHGGLADDVRKEQQERFL-NDELQVMVATSAFG 291 gil710074 WAKELAGEIKS KTS SRADFYHGGLESGDRILIQQQFI-HNQLDVICCTNAFG 2 90 gil931649 DCEKVSERLQ EFG HKAAFYHGSMEPEQRAFIQTQWS-KDEINIICATVAFG 405 gil082337 DSEQVTVSLQ -NLG IHAGAYHANLEPEDKTTVHRKWS-ANEIQVWATVAFG 375 g i l l 7 6 5 6 5 DCEKVAKALK SHG IKAKHYHAYMEPVDRSGAHQGWI-SGKIQVIVATVAFG 770 gi0464912 SCEQTSAQMQ RNG IKCAYYHAGMEPDERLSVQKAWQ-ADEIQVICATVAFG 961 g i l l75484 SCEQVAAKLRN DYG LKAWHYHAGLEKVERQRIQNEWQ-SGSYKIIVATIAFG 804 gi2500113 MAINLAQWLQ ERG FDSEAYHGGLGPHQRRQLEQKWL-TGQISSWCTNAFG 3 00 gi0861396 MVDDVNCVLR RIG- VRSAHYHAGLTKNQREKAHTDFM-RDKITTIVATVAFG 507 gil705486 ECDTMADTLQ RDG LAALAYHAGLSDSARDEVQQKWINQDGCQVICATIAFG 950 gi2276199 ECETVQMMLT KAG LSAEVYHAGLNDNLRVSVQRSWI-ANKFDVICATIAFG 773 gil066920 ECGQVAKMLE IAG IPAMAYHAGLGKKDRNEVQEKWM-NNEIPWAATVAFG 475 gi3036880 AAEEIAAFLR QRG YPVASYTGKTENADRLQAEEDLL-ANRVKALVATSALG 310 gi2642224 LVARLAEMLG CAAYSSESGSE-ADKAAIIQDWICGKGSPVIVATSALG 308 gi2131417 VCELLMKEVRNIFIETGREDLVTEVMSYRGGYSASDRRKIEREMF-HGNLKAVISTNALE 599 gi262173 8 KTRLIADYLT RRG VRAAAYHAGLSYRERQRIERAFA-SQELAAIVTTAALA 509 gi2128837 RAEYLAKALK SKG IKAEFYHGGMEYIKRRKVEDDFA-NQKIQCWTTAALS 492 : . : : * * : ... Domain IV -> <- Domain V -> gil280208 gi2130973 gi3420290 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi0464912 gi l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 MGINKADIRQVIHYGAPKD MGINKADIRQVIHYGAPKE MGINKPDIRKVIHYGAPKE MGINKPNVRFWHFDIPRN MGIDKSNIRFVLHAQIPKD MGVDKPDIRYVIHFHLPQT MGINKPDVRFVIHHSLPKS MGIDKPDVRFVIHHSMSKS MGIDKPNVRFVIHHSLPKS MGIDKPDVRFVYHFTVPRT MGVDKGDVRFVIHHSFPKS LGIDKPDTRWVLHYQAPLM MGIDKPDVRNVIHYGCPNN MGIDKPDVRFVIHASLPKS MGIDKPDVRFVIHYSLPKS MGIDKPDVRAVIHWSPSQN MGFDKPDLGFWHVGS PS S VGFDYPHVRFVIHLLGPDL LGIDIGGLDAVLMCGFPLS -MESYYQEIGRAGRDGLQSSCHVLWA--PADINLNRHL -MESYYQEIGRAGRDGLQSSCHLLWA--PADFNTSRNL - MESYYQEIGRAGRDGLPSCCHALWA--QADMNFNRHM - IESYYQETGRAGRDGLPAEAMLFYD- - PADMAWLRRC - MESYYQEAGRAGRDGLAS E CVLLFS--PQDIMVQRFL -AEAFMQEIGRAGRDGKPSVSILLRA--PGDFELQEQI -1EGYHQECGRAGRDGQRSSCVLYYG--YGDYIRVKHM - MENYYQESGRAGRDDMKADCILYYG--FGDIFRIS SM -1ENYYQESGRAGRDGQPATCILYYR--LADIFKQSSM - LEGYYQETGRAGRDGNYSYCITYFS--FRDIRTMQTM - LEGYYQETGRAGRDGKPAHCIMFYS--YKDHVTFQKL -LMDYLQEVGRAGRDLQPAECLTLVS--EPTGWLDSGD - IESYYQEIGRAGRDGS PSICRVFWA--PKDLNTIKFK - VEGYYQESGRAGRDGEISHCLLFYT--YHDVTRLKRL - IEGYYQETGRAGRDGMPSYCLMLYS- - YHDSIRLRRM -LAGYYQEAGRAGRDGKRSYCRIYYS--KQDKNALNFL - PIAYYQQVGRAGRGVDHADVLLLPG--REDEAIWAYF •PQLDDRAPAS •LTDFSQESGRAGRDGMPAESILLAG-- MANFHQQSGRAGRRNNDSLTLWASDS PVDQHYVAHP AGVDFPASQWFETLLMGNRWLMPNEFAQMLGRAGRPSYHDRGWYVL- -AEVGMEFDGE AGVDFPASTVILESLAMGADWLNPAEFQQMCGRAGRKGMHEIGKVYLL--VEIGKKYHAK 879 844 822 351 345 344 459 42 9 824 1015 858 354 561 1004 827 529 364 362 655 567 550 4- Domain VI -> 125 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 LT EIRNEKFRLYKLKMMAKMEKYLHSS--RCRRQIILSHFEDKQ 921 gi2130973 LI EIHDEKFRLYKLKMMVKMEKYLHSS--QCRRRIILSHFEDKC 886 gi3420290 LG EIPNKGFREYKLKMLTKMEKYLNSS- -TCRRKIILSHFEDKQ 864 gi2851488 LE EKPQGQLQDIERHKLNAMGAFAEAQ--TCRRLVLLNYFGE- - 391 gi2619051 IE QSEHEEKQKQDLKKLRQMVDYCHTE--DCLQRFILMYFGEK- 386 gil710074 I QMESVTAEEI-ADVIRVLEKTEERD--ERRLRDVLLQYGV-- 382 gil93164 9 ISQGGVDQSPMATGYNR-VASSGRLLETNTENLLRMVRYCENEV-ECRRFLQLVHLGEK- 516 gil082337 W MENVGQQKLYEMVSYCQNIS-KCRRVLMAQHFDEV- 465 g i l l 7 6 5 6 5 VQ QERTGIQNLYNMVRYAADSS-TCRRVKLAEHFEEA- 860 gi0464912 IQK D KNLDRENKEKHLNKLQQVMAYCDNVT - DCRRKLVLS YFNED - 1059 g i l l75484 IMS GDGDAETKERQRQMLRQVIQFCENKT-DCRRKQVLAYFGEN- 901 gi2500113 RQ LRQYFLSQASKYLQRAEVLSQQIPSQGNLGQLKAHFPD-- 394 gi08613 96 LRN SQQKEEWENLTMMLRQLELVLTTVG--CRRYQLLKHFDPS- 603 gil705486 IMME KDGNHHTRETHFNNLYSMVHYCENIT - ECRRIQLLAYFGEN - 1048 gi2276199 IEE GNTTTGVRSMHLNNVLQWAYCENVS-VCRRKMLVEHFGEV- 870 gil066920 VSGELAKLREKAKKNNAEGEKAEMQIKSIQTGLAKMLEYCESAR--CRHVSIASFFDDT- 586 gi3036880 ASVGFP PEEQVRRTLAVLEEAGRPMSLPALEPLVDLRRSRLETMLKVL- 412 gi2642224 GK ASSAEKGKVAPGADKEAMQLYRSRK--YCLRGVLSQLLDQR- 403 gi2131417 ES LLEVNNFESYQDLVLDFNNILILEGHIQCAAFELPINFERD- 698 gi2621738 SEE AMALKLLESGPDPVDVNYTEEDVLENILADITSGAIKSESEIS- 613 gi2128837 MEN TEDEVAFKLLNAVPEDVKVE YNE - - DEEEEQILATI SAG - 590 gil280208 VQKASLGIMGTEKCCDN CRSRLDHCYSMDDSEDTSWD FGPQAFKLLSAVD 971 gi2130973 LQKASLDIMGTEKCCDN CRPRLNHCLTANNSEDASQD FGPQAFQLLSAVD 936 gi3420290 LRKASSGIMGTEKCCDN CKTRLICNISINDTEDNLQD FGPQAYKFISAVD 914 gi2851488 GRQEPCGN CD ICLDPPKQYDGSTD AQIALSTIG 424 gi2619051 EPD-ACGQ CGNCTDTRAAHDVTRE AQMVLSCII 418 gil710074 GET Q ARMMIHLFMQGKTSVE LMKKEISY 410 gil931649 FDSTNCKKT CDNCCSSQSLIDKDVT LITRQLVELVK 552 gil082337 WNSEACNKM CDNCCKDSAFERKNIT E YCRDLIKILKQAE 504 g i l l 7 6 5 6 5 WEPSWCQKQ CDTCENGNGTFAIP 883 gi0464912 FDSKLCHKN CDNCRNSANVINEERD VTEPAKKIVKLVE 1097 g i l l75484 FDKVHCRKG CD ICCEEATYIKQDMT EFSLQAIKLLK 937 gi2500113 LEMALAWLH RRGNLE WLD P FN YR IN PGHYQANPLE 429 gi0861396 YAKPPTMQADC CDRCTEMLNGNQDS S SS IVD VTTESKWLFQVIN 647 gil705486 GFNPDFCKKHPDVSCDNCCKTKDYKTRDVTDDVKSIVRFVQEHSSSQGMRNIK 1101 gi2276199 YDEQSCRNSK-TPCDICERQRKNAEAIRL FDVSTDALSILK 910 gil066920 ECRPCKTN CDYCRDPTKTIRNVEA FINSEASTGRS 621 gi3036880 DVDGAVKR VKGGWAATGQAWAYDAER YAWVARQRQAEQQ 451 gi2642224 SDWRWCMEGD-QLCSVCPGHHFQARGPGDE FHFTAPAQAGDPS 445 gi2131417 KQYFTESHLRKICVERLHHNQDGYHASN RFLPWPSKCVS 737 gi2621738 PDPSWPLDPE-GALDILESHGMIVRDGGLR ATGYGVAVSKSFIG 656 gi2128837 ITNRYDIDR-VPYIGRAFSLNKILSNLES YGMIKANNDVK 629 126 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 ILG--EKFGIGLPILFLRGSNSQRLADQ-YRR HSLFGTGKDQTESWWKAFSRQLIT 1024 gi2130973 ILQ--EKFGIGIPILFLRGSNSQRLPDK-YRG HRLFGAGKEQAESWWKTLSHHLIA 989 gi3420290 VLG--QKFGTGVPVLFLRGSTSQRVPDR-FRN HSLFSSGKDQTEAFWKVLARQLIT 967 gi2851488 RVN--QRFGMGYWEVIRGANNQRIRDYGHDK LKVYGMGRDKSHEHWVSVIRQLIH 478 gi2619051 RMK--ERFGKTMVAQVLAGS KNKKVLENGFSD LSTYGILKHQSVGEISDFIEFLIS 472 gil710074 RME--LKLEKMHRVSFLLQ-RDGCLRQA L LTYFDESYEPDDGNLPCCSHCGFD 4 60 gil931649 QTG--ERFSSAHILEVYRGSLNQMVKKHRHET LQFHGAGKHLSKIEVSRILHYLVT 606 gil082337 ELN--EKLTPLKLIDSWMGKGAAKLRVAG--V VAPTLPREDLEKIIAHFLIQQYLK 556 g i l l 7 6 5 6 5 gi04 64 912 SIQN-ERVTIIYCQDVFKGSRSSKIVQANHDT LEEHGIGKSMQKSEIERIFFHLIT 1152 g i l l75484 SIS--GKATLLQLMDIFRGSKSAKIVENGWDR LEGAGVGKLLNRGDSERLFHHLVS 991 gi2500113 ELK--SQYRLMTQYLTTSRCRWQTILVA FGDNSPAARRPCGTCDNCLVG 476 gi0861396 EMYN-GKTGIGKPIEFLRGSSKEDWRIKTTSQ QKLFGIGKHIPDKWWKALAASLRI 702 gil705486 HVGPSGRFTMNMLVDIFLGSKSAKIQSG IFGKGSAYSRHNAERLFKKLIL 1151 gi2276199 CLPRMQKATLKYISELYRGALIKKSQEQAMRLGHTKLPFYSKGQGMSEQDALRFVRKLVI 970 gil066920 MFRKSASSGESGFDSVYGGGKRGGETEDELLS AASTSKDAMDRMEQEEAKRVRSVIS 678 gi3036880 AMREYVSTTRCRMEFLQRQLDDEKAAPCGRCDT CAGPWLDPAVSAGALAAATGELDR 508 gi2642224 TQGSSQRSIQGRGHPSIHGSGFPSIHGSSHPS IHGSSHPSIHGSSHPFIHGSGQ 4 99 gi2131417 LRG--GEEDQFAWDITNGRNIIIEEIEASRT SFTLYDGGIFIHQGYPYLVKEFNP 791 gi2621738 VTD--AEYIRGHLKGASRRPLDIALELEPFEG AYLSGRLHRALSRAAGARFSMNLMA 711 gi2128837 LTN YGSAVAISFLYPKVAEKIKEGIIEN KEIIKLITEIMPFENVYLSNNLKI 681 gil280208 EGFLVEVSRYNK--FMKICALTKKG-RNWLHKANTESQ-SLILQANEELCPKKFLLP--S 1078 gi2130973 EGFLVEVPKENK--YIKTCSLTKKG-RKWLGEASLQSPPSLLLQANEEMFPRKVLLP--S 1044 gi3420290 EGYLQESSGQTK--FSTICGLTSKG-SNWLIKANNEQCPSLLLPSNNELCLQRTRVSNFS 1024 gi2851488 LGLVTQNIAQH SALQLTEAA-RPVLAESSLQLA VPRIVALK 518 gi2619051 DDFIRMSDGTF PTLFVSSKG-RNVLKGELSVAR K EALK 509 gil710074 LSLYEQKGERSK--MAPLDSWSSEL-HRIFSLQTVGELN 496 gil93164 9 EDILVEDVRKS DMYGSVSSL-LQVLNCELFPFRFSLSVMIEDDSKLTGTLIVSIY 660 gil08233 7 EDYS FTAYAT I S - - YLKIGPKANLL - NNEAHAITMQVTKS TQNSFRAE 601 g i l l 7 6 5 6 5 gi04 64 912 IRVLQEYSIMNNSGFASSYVKVGPN-AKKLLTGKMEIKMQFTISAPNSRPSTSSSFQANE 1211 g i l l75484 EGVFVEKVEANR--RGFVSAYWPG-RQTIINSVLAGKRRIILDVKESSSKPDTSSRSLS 1048 gi2500113 RC . 478 gi08613 96 AGYLGEVRLMQMKFGSCITLSELGE-RWLLTGKEMKIDATPILLQGKKEKAAPSTVPGAS 761 gil705486 DKILDEDLYINANDQAIAYVMLGNK-AQTVLNGNLKVDFMETE NS SSVKKQ-- 12 01 gi2276199 EGYIHERLYSVPNQAAAVFAYAELT-EAGRDLANGKKTAKVYLHIVTCERKRKNAGLIEL 1029 gil066920 QEFAKRRQAAPPPRATARRVEPATD-VNVIKPEQNVIKNVTLETRENWVRFLHRALDS-- 735 gi3036880 PGVEVE PRKMWP-TGLAAVGMDLKG-RIPAGRQALTGRALGRLSDIGWGNRLRPLLS 563 gi2642224 HGGQRRKQQPDPPSEQRGDDWDQGE-TDIVGVDAIDVDANDELDALQGPETRMTYTGP-- 556 gi2131417 DERYAKVQRVDVDWVTNQRDFTDVDPQEIELIRSLRNSDVPVYFGKIKTTIIVFGFFKVD 851 gi262173 8 DSTLDILSDGD NLVKLDSKL-QEAVLNLQMDFLSCECRDRPFCGCIQRRLSEH-- 763 gi2128837 KLSKILNINVPS-RFFDALEVIREG-MEKIKDKKLKEDLTLIIMEFEGVEVEEKILEM-- 737 127 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 SKTVSSGTKEHCYNQVPVELSTEKK--SNLEKLYSYKPCDKISSGSNISKKSIMVQSPEK 1136 gi213 0973 SNPVS PETTQHS SNQNPAGLTTKQ SNLERTHSYKVPEKVSSGTNIPKKSAVMPSPGT 1101 gi342 02 90 SAQAHSSMVPHASSNTRSSMPKAGPEKMELKDKFSYQEAERLSKAAGVSKSSFKLQTPCK 10 84 gi2851488 PKAMQK SFGGNYDR K 533 gi2619051 AAAITEND E 518 gil710074 gil931649 LFVSR 665 gil082337 SSQTCHS 608 g i l l 7 6 5 6 5 gi0464912 DNIPVIAQKSTTIGGNVAANP PRFISAKEHLRSYTYGGS TMGSSHPITLKN 1262 g i l l75484 RSKTLPALREYQLKSTTASVDCSIGTREVDEIYDSQMPPVKPSLIHSRNKIDLEELSGQK 1108 gi2500113 gi0861396 RSQSTKSSTEIPTKILGAN KI RE YE PANENEQLMNLKKQ 800 gil705486 -K ALVAKVSQRE 1212 gi2276199 SN MN IVSEAQALKERHMVKHG 1050 gil066920 ' NWIVSGPP 743 gi3036880 AQAADG-P 570 gi2642224 SEIRSQRWQHTN 568 gi2131417 KYKRIIDAIETHNPPVIIN SKGLWIDMPKYALEICQK 888 gi2621738 IIAERISGRD 773 gi2128837 IINLRISGKT 747 g i l 2 802 0 8 AYSSSQPVISAQEQETQIVLYGKLVEARQKHANKMDVP-PAILATNKILVDMAKMRPTTV 1195 gi2130973 S S S PLEPAISAQELDARTGLYARLVEARQKHANKMDVP-PAILAANKVLLDMAKMRPTTV 1160 gi34202 90 LSRPPEPEVSPRERELQTTLYGRLWARQKIASERDIL-PAVLATNKVLVDMAKLRPTTS 1143 gi2851488 LFAKLRKLRKSIADESNVP-PYWFNDATLIEMAEQMPITA 573 gi2619051 LFERLRMVRKEIAAEQGVP-PFWFSDQTLKEMSGKQPVND 558 gil710074 gil931649 gil082337 E - -QGDKKMEE-KNSGNFQKKA-ANMLQQSGSKNTGAKKRKIDD 648 g i l l 7 6 5 6 5 gi0464 912 TSDLRSTQELNNLRMTYERLRELSLNLGNRMVPP-VGNFMPDSILKKMAAILPMND 1317 g i l l75484 FMS EYEIDVMTRCLKDLKLLRSNLMAIDDSR-VS SYFTDSVLLSMAKKLPRNV 1160 gi2500113 gi0861396 - - E VTGLPEKIDQLRSRLDDIRVGIANMHEVA-PFQIVSNTVLDCFANLRPTSA 851 gil705486 EMVKKCLGELTEVCKSLGKVFGVH-YFNIFNTVTLKKLAESLSSDP 1257 gi2276199 DVFTRCLQDLTHLITAVAESSGLSGPYSIVSREGIEQIAALLPRTN 1096 gil066920 AGVTTKQCAEQLEYGLYSISK-NETTYKNKCGHKLAEIKKLTL 785 gi3036880 VPDDVLRAWTVLADWARSPGGWATGSPDAVARPVGWAVPS 612 gi2642224 E--ESEYRQNMEAIKGMCMVCRVSGVNWHHAAGTCSDRFGWIRAKT 612 gi2131417 KQLNVAGAIHGAQHAIMGMLPRFIVAGVDEIQTECKAPEKEFAERQTK 936 gi2621738 PVEISRGLLSRYQIQAYPGDIFSWLDTTVRALESIGRIASAFK 816 gi2128837 PGQIS KTLYEE FKI QT YSGDIYYYLEQLLNLLDATERIARI FN 790 128 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 gi2130973 gi3420290 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi0464912 g i l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 gi2130973 gi3420290 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi04G4912 g i l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 ENVKRIDGVSEGK-AAMLAPLLEVIKHFCQTN SVQTDLFSSTKPQEEQKTS 124 5 ENMKQIDGVSEGK-AALLAPLVGVIKHFCQVT SVQTDLLSSAKPHKEQEKS 1210 ENMKKLDGVSEAK-SAMLAPLLEWKEFCIAN SLKVDVFSGSVSQSESTFF 1193 SEMLSVNGVGMRKLERFGKPFMALIRAHVDGD DEE 608 DELLS IKGVGEQKRAKYGRLFLQEIQAYARMT D 591 A 649 SAFATLGTVEDKYRRRFKYFKATIADLSKKRS SEDHEKYDTILNDEFVNRA 13 68 KELKEIHGVSNEKAVNLGPKFLQVIQKFIDEKEQNLEGTELDPSLQSLDTDYPIDTNALS 122 0 SNLEMIDGMSAQQKSRYGKRFVDCWQFSKETG IATNVNANDMIPPELISKMQKVL 907 EVLLQIDGVTEDKLEKYGAEVISVLQKYSEWTS PAEDSSPGISLSSSRGPGRS 1310 SDLLRIDSMTQIKVTKYGRLIMELLATYWKQVD EREEEEMRNQLDKLKSGEIV 114 9 KNAPFAYTNTEVEQNGFVKASNLS 809 RTRPQLVGSLAEGVARVGRLPLLGSLAHTPHAD EYAAHRSNSAQRLRALAESF 665 EVMERCRSKKKRWMPWLKVCWRCFQPQWLCRAA DPAMSEGAGPSKVARRGRG 664 RKRPARLIFYDSKGGKYGSGLCVKAFEHIDD11 ESSLRRIEECPCSDGCPDCV 989 RRRYLKECSGIVRAIEKGRGA 83 7 KRYAEKVKELKEKIENPK 808 LVAKNKICTLSQSMAITYSLFQEKKMPLKSIAESRILPLMTIGMHLSQAVKAGCPLDLER 13 05 QEMEKKDCSLPQSVAVTYTLFQEKKMPLHSIAENRLLPLTAVGMHLAQAVKAGYPLDMER 1270 TPREQERISLPESQRMSYSLFQEQNLSLKKIADVRCLSMAWGMHLWQALKAGYSFDVQR 12 53 AASSNGIAQSTGTKSKFFGANLNEAKENEQIINQIRQSQLPKNTTSSKSGTRSISKSSKK 142 8 LDHEQGFSDDSDSVYEPSSPIEEGDEEVDGQRKDILNFMNSQSLTQTGSVPKRKSTSYTR 12 80 SDATORVTTEHLISRSTAKEVATARGISEGTVYSYLAMAVEKGLPLHLDKLNVSRKNIAM 967 -AAEELDEEIPVSSHYFASKTRNERKRKKMPASQRSKRRKTASSGSKAKGGSATCRKISS 13 69 MGGFATLQSDPGFPSVPYMKPLGGGGGCRGRGKKRAFSGFSSGRATKKPRATAPSARGKT 1209 TVPGELAAALAATDGPVLLVDDFTDSGWTLAVGARLLRQAGADDVLPLVLALAG 719 TAAEESRVNRTECEYRDLVIPLCHAVFKKEARTDWLRATFHVEFSDVNEYMLWVGTPAML 724 AASFCKENSLVLSKPGAQWLHCILGHSEDSFIDLIKDGPEPNMPEIKVETVIPVSEHVN 104 9 129 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 AGLTPEVQKIIADVIRNPPVNSDMSKISLIRMLVPENIDTYLIHMAIEILK-HGPD-SGL 1363 gi213 0 973 AGLTPETWKIIMDVIRNPPINSDMYKVKLIRMLVPENIDTYLIHMAIEILQ-SGSD-SRT 132 8 gi342 0290 AGLTPEMKKLITYAIKKPPINSDLSSFKAIREYVPANIDGYPIRMVISLLEKEGSSGAQG 1313 gi2851488 gi2619051 gil710074 gil931649 '-gil082337 g i l l 7 6 5 6 5 gi0464912 SANGRRGFRNYRGHYRGRK 1447 g i l l75484 PSKSYRHKRGSTSYSRKRKYSTSQKDSRKTSKSANTSFIHPMVKQNYR 1328 gi2500113 gi0861396 ALNAVRVHLGSNVAVLTPWVEAMGWPDFNQLKLIRAILIYEYGLDTS ENQE KPDIQSMP 1027 gil705486 KTKSSSIIGSSSASHTSQATSGANSKLGIMAPPKPINRPFLKPSYAFS 1417 gi2276199 SGRGGAKPATSLKRNMYPATSM 1231 gil066920 gi3036880 gi2642224 AGKECVMANCVAAAQLQVWDRRDDDEER 752 gi2131417 FSDDFKIIDVRRATKDDTHTNE11KKEI 1077 gi2621738 gi2128837 gil280208 QP SCDVN KRRCFPGSEEICS 1383 gi2130973 QP PCDSS RKRRFPSSAESCE 1348 gi3420290 QPEFPTQKTLIQTEENPKNVSVQNTKHKVTMGKSMWIEKKPTQPATAELEVTKGKALAPI 13 73 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi0464912 gi l l75484 • gi2500113 gi0861396 STSNPSTIKTVPSTPS SSLRAPPLKKFKL. 1056 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 : gi2128837 130 Figure 32. CLUSTALW MSA for RecQ Subfamily Sequences (cont'd) gil280208 S S KRS KEEVGINTETSS AERKRRLPVWFAKG SDT - S KKLMDKTKRGG - 1429 gi2130973 SCKESKE-WTETKASSSESKRKLPEWFAKGNVPSADTGSSSSMAKTKKKG- 1398 gi342 02 90 .MLASWNEASLDADTEELFSESQSSTTRPRRRLPEWFGSTKGNAATRCIQESKNLGEEKGS 143 3 gi2851488 gi2619051 gil710074 gil931649 gil082337 - T g i l l 7 6 5 6 5 gi0464912 gi l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 gil280208 LFS 1432 gi2130973 LFS 1401 gi3420290 FFD 1436 gi2851488 gi2619051 gil710074 gil931649 gil082337 g i l l 7 6 5 6 5 gi0464912 gi l l75484 gi2500113 gi0861396 gil705486 gi2276199 gil066920 gi3036880 gi2642224 gi2131417 gi2621738 gi2128837 131 b) Comparison of Alternate Molecular Phylogenies for WRN Homologous Genes A comparison of the molecular phylogenies of Figure 30(b) and Figure 31 reveal that they are not identical in their predictions of related sequences. There appears to be, however, good agreement in the ortholog mapping assignment between Caenorhabditis elegans loci and the corresponding human loci, confirming earlier preliminary analysis of the sequences using the domain VI specific MSA. These results unambiguously indicate that T04A11.6 is the Bloom syndrome orthologous locus. The worm locus K02F3.1 appears to map onto human RECQL. For WRN, however, the ClustalW and Bete tree mildly disagree; the former asserts a closer association of F18C5.2 with WRN; the latter suggests that E03A3.2 is slightly closer to WRN than F18C5.2. A closer examination of the identities of the most highly conserved helicase motifs suggests that that ClustalW result is likely correct, since there are several residues only shared between F18C5.2 and WRN, while E03A3.2 is very diverged from WRN within these sequence motifs. It is possible that additional homologous helicase locus sequences remain to be characterized in either or both the worm and human genome. It is also possible that the human genome may contain several paralogs corresponding to a single nematode locus, in the same manner that Sgsl appears to root the eukaryote paralogs of higher animals. Pending complete sequence data being available for the genomes concerned, the proposed ortholog mappings must be taken as provisional assignments. c) Identification of Conserved Residues by Comparison of Orthologous Loci The above caveat notwithstanding, one may take the four proposed orthologous gene mappings for WRN related helicases from Caenorhabditis elegans onto Homo sapiens as a working hypothesis for further detailed analysis leading to the discovery of highly conserved sequence residues which might have subfamily-specific functional relevance. 132 That is, the identification of candidate orthologs amplifies the "signal" of functional conservation in sequence alignments to the greatest degree possible. If the evolutionary distance between the two species is large enough, intervening sequences will not generally be subject to significant selective pressure and will become increasingly randomized. In contrast, subfamily-specific critical functional residues will tend to be conserved. Candidate orthlogs may be subjected to rigorous pair wise comparisons in order to identify critical residues. Useful observations will be global sequence identities over the protein sequence. Conservation of residue character (e.g. hydrophobic v/s hydrophilic, large v/s small, etc.) would also likely be apparent. Many such residues would not be highlighted to a great extent in MSAs comparing paralogous as well as orthologous sequences (e.g. "all the RecQ helicases"), since distinct ortholog "subfamilies" would not necessarily conserve the same residues at the same positions. Caenorhabditis elegans and Homo sapiens are separated by hundreds of millions of years of evolution, suggesting that the determination of orthologous gene pairs in these two species might be helpful in elucidating highly conserved functional residues by pair wise comparison of the sequences. Such an analysis for WRN, BLM and RECQL worm/human orthologous gene pairs was undertaken here. The E03A3.2 candidate human EST homolog was considered too short for a meaningful global comparison at this time; however, in order to further discriminate between F18C5.2 and E03A3.2 as candidate orthologous loci to WRN, a ClustalW MSA is also provided between WRN and E03A3. A similar comparison between Xenopus laevis, Mus musculus and Homo sapiens can also be made. The evolutionary divergence between these species is somewhat smaller. However, mutational drift within variable loop regions may still be obvious as a lower average percentage of identities between subsequences of the orthologs, in contrast to an almost 100% identity in subsequences experiencing some form of natural selection preserving residues of structural or 133 functional significance. This phenomenon is observed in the MSA of sequences for the WRN orthologous vertebrate loci in these frog, mice and humans (Figure 40). 134 General Notes for Figure 33 through Figure 40 Either BLAST2 or ClustalW was employed in this series of figures to generate MSAs of candidate ortholog gene pairs; Canonical helicase motifs are annotated (e.g. <- Domain I ->). Some motifs exhibiting relatively high sequence conservation outside the canonical helicase motifs are boldly highlighted and framed by a box. Figure 33. BLAST2 Pairwise Comparison of the BLM and T04A11.6 Orthologs Query: gil705486 Length 1417 Subject: gi2276199 Length 1231 Score = 428 bits (1090), Expect = e-119 Identities = 263/680 (38%), Positives = 373/680 (54%), Gaps = 60/680 (8%) Query: 653 KEMMKIFHf KFGLHNFRTNQLEAINAALLGEDCFIliMPTGGGKSLCYQLPACVSPGVTVV 712 KE+ KFG + FR Q + I + L+G D F+LMPTG GKSLCYQLPA + PGVTW Sbjct: 476 KELYDTLKS KFGFNQFRHRQKQCILSTLMGHDTFVI.MPTGAGKSLCYQLPAVILPGVTW 535 4r Domain I -> ... Query: 713 ISPLRSLIVDQVQKLTSLDIPATYLTGDKTDSEATNIYLQLSKKDPIIKLLYVTPEKICA 772 +SPLRSLI DQ K+ L I LT D IY +L +P IKLLYVTPEKI A Sbjct: 536 VSPLRSLIEDQKMKMKELGIGCEALTADLGAPAQEKIYAELGSGNPSIKLLYVTPEKISA 595 ... Domain la -> Query: 773 SNRLISTLENLYERKLIjARFVIDEAHCVSQWGHDFRQDYKRlVINMLRQKF- - PSVPVMALT 830 S RL S +L+ R LLARFVIDEAHCVSQWGHDFR DY +++ LR+K+ P VP++ALT Sbjct: 596 SGRLNSVFFDLHRRGLLARFVIDEAHCVSQWGHDFRPDYTKLSSLREKYANPPVPIIALT 655 <- Domain II -> <- ... Query: 831 ATANPRVQKDILTQLKILRPQVFSMSFNRHNXXXXXXXXXXXXXAFDCLEWIRKHHPYDS 890 ATA P++ D LK+ ++F SF R N + +E +++ +P S Sbjct: 656 ATATPKIVTDARDHLKMQNSKLFISSFVRDN-LKYDLIPKAARSLINWEKMKQLYPGKS 714 Domain III Query: 891 GIIYCLSRRECDTMADTLQRDGLAALAYHAGLSDSARDEVQQKWINQDGCQVICATIAFG 950 GI+YCLSR+EC+T+ L + GL+A YHAGL+D+ R VQ+ WI + VICATIAFG Sbjct: 715 GIWCLSRKECETVQN1MLTKAGLSAEVYHAGLNDNLRVSVQRSWI-ANKFDVICATIAFG 773 <- Domain IV -> 4-Query: 951 MGIDKPDVRFVIHASLPKSVEGYYQESGRAGRDGE! SHCLLFYTYHDVTRLKRLI[*1MEKD 1010 MGIDKPDVRFVIH SLPKS+EGYYQE+GRAGRDG S+CL+ Y+YHD RL+R+I Sbjct: 774 MGIDKPDVRFVIHYSLPKSIEGYYQETGRAGRDGMI'SYCLMLYSYHDSIRLRRMI ... Domain V -> ir Domain VI -> ++ -EE 830 Query: 1011 Sbjct: 831 -RETHFNNLYSMVHYCENITECRRIQLLAYFGENGFNPDFCKKHPDVSCDNCCK GNHHT-GN T R H NN+ +V YCEN++ CRR L+ +FGE GNTTTGVRSMHLNNVLQWAYCENVSVCRRKMLVEHFGE 1068 ++ C++ CD C + •VYDEQSC-RNSKTPCDICER 888 Query: 1069 TKD YKTRDVTDDVKSIVRFVQEHSSSQGMRNIKHVGPSGRFTMNMLVDIFLGSKSA 1124 + + DV+ D SI++ + + T+ + +++ G+ Sbjct: 889 QRKNAEAIRLFDVSTDALSILKCLPRMQKA TLKYISELYRGALIK 933 135 Figure 33. BLAST2 Pairwise Comparison ofthe BLM and T04A11.6 Orthologs (cont'd) Query: 1125 KIQS GIFGKGSAYSRHNAERXXXXXXXXXXXXXXXXXNANDQA-- IAYVM 1172 K Q + KG S +A R N A AY Sbjct: 934 KSQEQAMRLGHTKLPFYSKGQGMSEQDALRFVRKLVIEGYIHERLYSVPNQAAAVFAYAE 993 Query: 1173 LGNKAQTVLNG NLKVDFMETENSSSVKKQKALVAK-VSQREEMV 1215 L + + N G + E N + V + +AL + + + ++ Sbjct: 994 LTEAGRDLANGKKTAKVYLHIVTCERKRKNAGLIELSNMNIVSEAQALKERHKV'KHGDVF 1053 Query: 1216 KKCLGELTEVCKSLGKVFGVH-YFNIFNTVTLKKLAESLSSDPEyliLQIDGVTEDKLEKY +CL +LT + ++ + G+ ++I + ++++A L Sbjct: 1054 TRCLQDLTHLITAVAESSGLSGPYSIVSREGIEQIAALLPRTNSOLLRIDSMTQIKVTKY LL+ID +T+ K+ KY 1274 1113 Query: 1275 GAEVISVLQKYSEWTSPAED 1294 G ++ +L Y + E+ Sbjct: 1114 GRLIMELLATYWKQVDEREE 1133 Figure 34. CLUSTALW Pairwise Comparison ofthe BLM and T04A11.6 Orthologs gil705486 MAAVPQNNLQEQLERHSARTLNNKLSLSKPKFSGFTFKKKTSSDNNVSVTNVSVAKTPVL 60 gi2276199 MFTLPPVLNSSYVGIHGNSTFIN FEFSFYTLPMLFLLVPILYIPITIIIIL 51 * : : * . . : *. * : * * : * . . *.: :.:: :* gil705486 RNKDVNVTEDFSFSEPLPNTTNQQRVKDFFKNAPAGQETQRGGSKSLLPDFLQTPKEWC 12 0 gi2276199 R ILVKLYYAFRD--RNN-NVYLLS AISISQCMCLLFFLADFLY L 92 gil705486 TTQNTPTVKKSRDTALKKLEFSSSPDSLSTINDWDDMDDFDTSETSKSFVTPPQSHFVRV 180 gi2276199 --R-LPTSG LLTSWCASIEPNRFITI LTIFTYHINYSTMIFPFLVSIMRL 139 gil705486 STAQKSKKGKRNFFKAQLYTTNTVKTDLPPPSSESEQIDLTEEQKDDSEWLSSDVICIDD 240 gi2276199 ILIISPKNHKK--FNGQL LR FSIPFICVY- 166 gil705486 GPIAEVHINEDAQESDSLKTHLEDERDNSEKKKNLEEAELHSTEKVPCIEFDDDDYDTDF 3 00 gi2276199 -PIIFTFFMFPAIGYCSYAAYP FPFG-AIIFRIERTFFGLVNNF 208 gil705486 VPPSPEEIISASSSSSKCLSTLKDLDTSDRKEDVLSTSKDLLSKPEKMSMQELNPETSTD 36 0 gi2276199 S LLFNTLFWMTCCIITNFILLL -LLIKSRCLLN-AQTRSMHSYKVEVSLS 256 gil7054 86 CDARQISLQQQLIHVMEHICKLIDT-IPDDKLKLLDCGNELLQQRNIRRKLLTEVDFNKS 419 gi2276199 LTTFSMIFS-YLSNAMIVICSFFFWNYTSLAIMLRPFGNDLDTCVAPWVFYLTHPVFRKK 315 gi1705486 DASLLGSLWRYRPDSLDGPMEGDSCPTGNSMKELNFSHLPSNSVSPGDCLLTTTLGKTGF 479 gi2276199 ACVTTHSNYCFCFFFR FSLNFEIFCVSKGHYSF 348 : * * : * : * . . * . : : . : * : . * gil705486 SATRKNLFERPLFNTHLQKSFVSSNWAETPRLGKKNESSYFPGNVLTSTAVKDQNKHTAS 53 9 gi2276199 N--EYQQFPSRPQKRLVDP PIVDLDEE PPIVDLDDSFDNFHVGSTS 392 136 Figure 34. CLUSTALW Pairwise Comparison of the BLM and T04A11.6 Orthologs (cont'd) gil7054 86 INDLERETQPSYDIDNFDIDDFDDDDDWEDIMHNLAASKSSTAAYQPIKEGRPIKSVSER 599 gi2276199 EEWSGDIAPEEEEE EGHDSFDDFESVPAQPPS-KNTLASLQ KSDSE- 438 gil705486 LSSAKTDCLPVSSTAQNINFSESIQNYTDKSAQNLASRNLKHERFQSLSFPHTKEMMKIF 659 gi2276199 IALNQQRHDMHGRFRGFLQDDSEEFS DEVGLLGADMNKELYDTL 482 gil705486 gi2276199 HKKFGLHNFRTNQLEAINAALLGEDCE ILMPTGGGKSLCYQLPACVSPGVTWISPLRSL 719 KSKFGFNQFRHRQKQCILSTLMGHDTEVLMPTGAGKSLCYQLPAVILPGVTVVVSPLRSL 542 . * * * . . . * * * . * : ; * : * . * * : * * * * * . * * * * * * * * * * : ****** .****** <- Domain I -r> Domain la ... gil7054 86 I V D Q V Q K L T S L D I P A T Y L T G D K T D S E A T N I Y L Q L S K K D P I I K L L Y V T P E K I C A S N R L I S T 779 gi2276199 I E D Q K M K M K E L G I G C E A L T A D L G A P A Q E K I Y A E L G S G N P S I K L L Y V T P E K I S A S G R L N S V 602 * ** * . _* * _ * * * _ .** ; * . . ;* * * * * * * * * * * * . * * . * * *_ ... Domain la (cont'd) -> gil705486 L E N L Y E R K L L A R F V I D E A H C V S Q W G H D F R Q D Y K R M N M L R Q K F - - P S V P V M A L T A T A N P R V 837 gi2276199 F F D L H R R G L L A R F V I D E A H C V S Q W G H D F R P D Y T K L S S L R E K Y A N P P V P I I A L T A T A T P K I 662 . : * : . * ********************* * * . : : . * * : * : * * * ; . • * * * * * * • ' . 4- Domain I I - > 4- Domain I I I - > gil705486 Q K D I L T Q L K I L R P Q V F S M S F N R H N L K Y Y V L P K K P K K V A F D C L E W I R K H H P Y D S G I I Y C L S 897 gi2276199 V T D A R D H L K M Q N S K L F I S S F V R D N L K Y D L I P K A A R - S L I J T S T V V E K M K Q L Y P G K S G I V Y C L S 721 * :**: . . : : * ** * .**** : :** . : :: :* : : : :* _***.**** <r ... gil705486 R R E C D T M A D T L Q R D G L A A L A Y H A G L S D S A R D E V Q Q K W I N Q D G C Q V I C A T I A F G M G I D K P D 957 gi2276199 R K E C E T V Q M M L T K A G L S A E V Y H A G L N D N L R V S V Q R S W I A N - K F D V I C A T I A F G M G I D K P D 780 * . * * . * . * . * * . * _*****_*_ * _**._** . .**************** ... Domain I V - > <- Domain V gil705486 VRFVIHASLPKSVEGYYQESGRAGRDGEISHCLLFYTYHDVTRLKRLI *1MEKDGNHHTRE 1017 gi2276199 VRFVIHYSLPKSIEGYYQETGRAGRDGMESYCLMLYSYHDSIRLRRMI3E-GNTTTGVRS 839 ****** * * * * * . * * * * * * . * * * * * * * * . * * . . * . * * * * * . * . * . _ _* <- Domain V I - > NGFNPDFCKKHPDVSCDNCCKTKD YK 1073 gil705486 1HFNNLYSMVHYCENITECRRIQLLAYF E gi2276199 1^ HLNNVLQVVAYCEKTVSVCRRKMLVEHFGEV-YDEQSCRNS-KTPCDICERQRKNAEAlR 8 97 .* * * * * . . *** * . . * ** . . . * gil705486 TRDVTDDVKSIVRFVQEHSSSQGMRNIKHVGPSGRFTMNMLVDIFLGSKSAKIQSGIFGK 1133 gi2276199 LFDVSTDALSILKCLPRMQKAT-LKYISELYRGALIKKSQEQAMRLGHTKLPFYS K 952 gil705486 GSAYSRHNAERLFKKLILDKILDEDLY--INANDQAIAYVMLGNKAQTVLNG 1183 gi2276199 GQGMSEQDALRFVRKLVIEGYIHERLYSVPNQAAAVFAYAELTEAGRDLANGKKTAKVYL 1012 gil705486 NLKVDFMETENSSSVKKQKALVAKVSQRE-EMVKKCLGELTEVCKSLGKVFG 1234 gi2276199 HIVTCERKRKNAGLIELSNMNIVSEAQALKERHMVKHGDVFTRCLQDLTHLITAVAESSG 1072 . .* * * . . * * . . .. . * * . * * . . . . * 137 Figure 34. CLUSTALW Pairwise Comparison ofthe BLM and T04A11.6 Orthologs (cont'd) gil705486 VH-YFNIFNTVTLKKLAESLSSDPE^rLLQIDGVTEDKLEKYGAEVISVLQKYSEWTSPAE 1293 gi 2 2 7619 9 LSGPYSIVSREGIEQIAALLPRTNSI (LLRIDSMTQIKVTKYGRLIMELLATY WKQVDERE 1132 : : . * . . : : : : * * . . ** : ** . : * : . . _ . *—"TT . * gil705486 DSSPGISLSSSRGPGRSAAEELDEEIPVSSHYFASKTRNERKRKKMPASQRSKRRKTASS 1353 gi2276199 EEEMRNQLDKLK SG E--IVMGGFATLQSD PGFPSVPYMKPLGG 1173 gil7054 86 GSKAKGGSATCRKISSKTKSSSIIGSSSASHTSQATSGANSKLGIMAPPKPINRPFLKPS 1413 gi2276199 GGGCRG-RGKKRAFSGFS SGR ATKKPRATAPSARGKTSGRGGAKPATSLKR-NMYPA 122 8 gil705486 YAFS 1417 gi2276199 TSM- 1231 Figure 3 5. BL AST2 Pairwise Comparison of the RECQL and K02F3.1 Orthologs Query: gil082337 Length 649 Subject: gill76565 Length 883 Score = 480 bits (1222), Expect = e-134 Identities = 238/476 (50%), Positives = 314/476 (65%), Gaps = 9/476 (1%) Query: 4 VSALTEELDSITSELHAVEIQIQELTERQQEXXXXXXXXXXXXXXCL-EDSDAGASNEYD 62 +S L+ EL + E+ ++ QI +L ++ E EDSD Sbjct: 406 LSKLSTELADLDGEIGQIDQQISQLRRKKSELTQKRQAIERKIELKTNEDSDWTDR 462 Query: 63 SSPAJiWNKEDFPWSGKVKDILQWFKLEKFRPLQLETINVTMAGKEVFLVMPTGGGKSLC 122 W+++ FPWS + IL+ F LEKFRPLQ IN M+ ++ +++ TGGGKSLC Sbjct: 463 WDRDGFPWSDEATKILKEQFHLEKFRPLQRAAINAVMSKEDAWILSTGGGKSLC 517 Domain I -> Query: 123 YQLPALCSDGFTLVICPLISLMEDQLMVLKQLGISATMLNASSSKEHVKWVHAEMVNKNS 182 YQLPAL ++G LV+ PLISL+EDQ++ L+ LGI ++ LNA++SKE K V + NK+S Sbjct: 518 YQLPALLANGLALWSPLISLVEDQILQLRSLGIDSSSLNANTSKEEAKRVEDAITNKDS 577 Domain l a Query: 183 ELKLIYVTPEKIAKSKMFMSRLEKAYEARRFTRIAVDEVHCCSQWGHDFRPDYKALGILK 242 + +L+YVTPEK+AKSK M++LEK+ IA+DEVHCCSQWGHDFR DY L +LK Sbjct: 578 KFRLLYVTPEKLAKSKKMMNKLEKSLSVGFLKLIAIDEVHCCSQWGHDFRTDYSFLNVLK 63 7 ... -> ^-Domain II -> Query: 243 RQFPNASLIGLTATATNHVLTDAQKILCIEKCFTFTASFNRPNLYYEVRQKPSNTEDFIE 302 RQF ++GLTATAT++VL D + +L 1+ TF A FNR NL Y+V QKP + ++ E Sbjct: 638 RQFKGVPILGLTATATSNVLDDVKDMLGIQAALTFRAGFNRSNLKYKWQKPGSEDECTE 697 4r Domain III -> 138 Figure 35. BLAST2 Pairwise Comparison of the RECQL and K02F3.1 Orthologs (cont'd) Q u e r y : 303 DIVKLINGRYKGQSGIIYCFSQKDSEQVTVSLQNLGIHAGAYHANLEPEDKTTVHRKWSA 362 +1 K I + GQ+GIIYC S+ D E+V +L++ G I A YHA +EP D++ H+ W + S b j c t : 698 EIAKTIKRDFAGQTGIIYCLSRNDCEKVAKALKSHGIKAKJHYHAYMEPVDRSGAHQGWIS 757 Domain IV -> Q u e r y : 363 NEIQVWArVAFGMGIDKPDVRFVIHHSMSKSMENYYQESGRAGRDDMKADCILYYGFGDl 422 + IQV+VATVAFGMGIDKP+VRFVIHHS+ KS+ENYYQESGRAGRD A C I L Y Y D Sbj C t : 758 GKIQVIVATVAFGMGIDKPNVRFVIHHSLPKSIENYYQESGRAGRDGQl'ATCILYYRLAD 4- Domain V -> <- Domain VI -> 817 Q u e r y : 423 S b j c t : 818 IFRISS^^^AraENVGQQKLYEMVSYCQNISKCRRVLMAQHFDEVWNSEACNKMCDNC IF+ SSMV E G Q L Y M V Y + S CRRV +A+HF + E W C K CD C IFKQSSMVQQERTGIQNLYNMVRYAADSSTCRRVKLAEHFEEAWEPSWCQKQCDTC 478 873 Figure 36. CLUSTALW Pairwise Comparison ofthe RECQL and K02F3.1 Orthologs g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 MDSAESELLEEEELPEIKYIDSAEFQNFDNNADQREIQFRWMEAFETKEPRWFRLLDEFI 60 g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 TPLSGKRGYCSRIVEKYARVLLHQGFELSQTDGPLRVAAIAASRDVGQAGNLPSVLNNLS 12 0 g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 EVTRHLNLAQYASTSLATVGQRKFANEWQKTVNQERLGCFHLFLIGYREGLENLLSNHR 180 g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 HLRNFRYSAKKRHDLLTEHIKNIFQERDSFIQAPEASSFALQTVLENWTFPLQNEQIVQL 240 g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 AHANTLQMRALHVKGDGFYFLNEALNVAIAFNWSSSQEEMEEFYSIFTPQATNPRGKRIF 3 00 g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 QFWKMISKYDRSNGNAQGFLVTAQRLEEESKSDDIGAGCVTAVELACKIRNQSKTIMDTI 360 g i l 0 8 2 3 3 7 SAS V S A L T E E L D S I T S E L 18 g i l l 7 6 5 6 5 TRHPSLGPRMMQLAERCGVGYENEEIIHQICDRFSLHPNLMSDVLLSKLSTELADLDGEI 420 .* *. ** . * . g i l 0 8 2 3 3 7 g i l l 7 6 5 6 5 GQIDQQISQLRRKKSELTQKRQAIERKIELKTNED HAlVEIQIQELTERQQELIQKKKVLTKKIKbCLEDSDAGASNEYDSSPAflWNKEDFPWSGK DWTDHWDRDGFPWSDE 78 473 139 Figure 36. CLUSTALW Pairwise Comparison of the RECQL and K02F3.1 Orthologs (cont'd) gil082337 gill76565 ** . . * . ******** T5FTF *" VKEILQNVFKLEKFRPLQLETINVTMiGKEVFLVMPTGGGKSLCYQLPALCSDGFTLVIC 138 ATKILKEQFHLEKFRPLQRAAINAVMSKEDAWILSTGGGKSLCYQLPALLANGLALWS 533 * * * * * * * * * * * * * * . . * . . * * . 4- Domain I -> gil082337 PLISLMEDQLMVLKQLGISATMLNASSSKEHVKWVHAEMVNKNSELKLIYVTPEKIAKSK 198 gill76565 PLISLVEDQILQLRSLGIDSSSLNANTSKEEAKRVEDAITNKDSKFRLLYVTPEKLAKSK 593 * * * * * . * * * . . * . * * * . . * * * . * * * _ * * . _ * * . * . . . * . * * * * * * . * * * * 4r Domain la -> gil082337 MFMSRLEKAYEARRFTRIAVDEVHCCSQWGHDFRPDYKALGILKRQFPNASLIGLTATAT 258 gill76565 KMMNKLEKSLSVGFLKLIAIDEVHCCSQWGHDFRTDYSFLNVLKRQFKGVPILGLTATAT 653 ; * _ . * * * . _ * * ; * * * * * * * * * * * * * * * * _ * . * * * * * _ _ ; . * * * * * * * <- Domain II -> 4- Domain III ... gil082337 NHVLTDAQKILCIEKCFTFTASFNRPNLYYEVRQKPSNTEDFIEDIVKLINGRYKGQSGI 318 gill76565 SNVLDDVKDMLGIQAALTFRAGFNRSNLKYKWQKPGSEDECTEEIAKTIKRDFAGQTGI 713 . : ** * . ; . ; * *; . : * * * . * * * . * * * ;* ***__ : : * . * _ * * . . * * . * * ... Domain III (cont'd) -> <- ... gil082337 IYCFSQKDSEQVTVSLQNLGIHAGAYHANLEPEDKTTVHRKWSANEIQVVVATVAFGMGI 3 78 gill76565 IYCLSRNDCEKVAKALKSHGIKAKHYHAYMEPVDRSGAHQGWISGKIQVIVATVAFGMGI 773 * * * . * . . * * . * . : * ; . * * : * *** .** * . . _*. * . . * * * . * * * * * * * * * * ... Domain IV <- Domain V gil08233 7 DKPDVRFVIHHSMSKSMENYYQESGRAGRDDMKFVDCILYYGFGDIFRISSMWMENVGQQ gill76565 DKPNVRFVIHHSLPKSIENYYQESGRAGRDGQP IVTCILYYRLADIFKQS SMVQQERTGIQ 438 833 * * * ; * * * * * * * * . > * * . * * * * * * * * * * * * * > * * * * * * ; > * * * . * * * * * * * <- Domain VI -> gil082337 KLYEMVSYCQNISKCRRVLMAQHFDEVWNSEACNKMCDNCCKDSAFERKNITEYCRDLIK 4 98 gill76565 MLYNMVRYAADSSTCRRVKLAEHFEEAWEPSWCQKQCDTCE 874 . * * . * * * . * * * * * . * . * * . * * . * . * ** * gil082337 ILKQAEELNEKLTPLKLIDSWMGKGAAKLRVAGWAPTLPREDLEKIIAHFLIQQYLKED 558 gill76565 gil0823 3 7 YSFTAYATISYLKIGPKANLLNNEAHAITMQVTKSTQNSFRAESSQTCHSEQGDKKMEEK 618 gill76565 NGN 877 . * . gil082337 NSGNFQKKAANMLQQSGSKNTGAKKRKIDDA 649 gill76565 --GTFAIP 883 140 Figure 37. BLAST2 Pairwise Comparison of the WRN and F18C5.2 Orthologs Query: gil280208 Length 1432 Subject: gi0861396 Length 1056 Score = 394 bits (1002), Expect = e-108 Identities = 257/793 (32%), Positives = 415/793 (51%), Gaps = 55/793 (6%) Query: 531 Sbjct: 210 lAPNEEQVTCLKMYFGHSSFKPVQWKVIHSVLEERRDfSTVAVMATGYGKSLCFQYPPVYVGK + P +E + L +FGH F+ QW V+ +VL + D[ SPPQEALNALNEFFGHKGFREKQWDWRNVLGGK 590 +M+TGYGKS +C+Q P + + DpFVLMSTGYGKSVCYQLPSLLLNS 268 ^- Domain I -> Query: 591 IGLVISPLISLMEDQVLQLKMSNIPACFLG--SAQSENVLTDIKLGKYRIVYVTPEYCSG 648 + +V+SPLISLM DQV L I A L S Q E + + R +Y++PE + Sbjct: 269 MTVWS PL IS LMNDQVTTLVS KGIDAVKLDGHS TQIE WDQVANNMHRI RF IYMS PEMVTS 328 4- Domain la Query: 649 NMGLLQQLEADIGITLIAVDEAHCISEWGHDFRDSFRKLGSLKTALPM--VPIVALTATA 706 GL I+L+A+DEAHC+S+WGHDFR+S+R L ++ + +P++ALTATA Sbjct: 329 QKGLELLTSCRKHISLLAIDEAHCVSQWGHDFRNSYRHLAEIRNRSDLCNIPMIALTATA 388 Domain II -> <- Domain III ...-> Query: 707 SSSIREDIVRCLNLRNPQITCTGFDRPNLYLEVRRKTGNILQDLQPFLVKTSS--HWEFE 764 + +R+D++ L LR P IT T FDR NLY+ V + ++ +DL F+ KT F Sbjct: 389 TVRVRDDVIANLRLRKPLITTTSFDRKNLYISVH-SSKDMAEDLGLFM-KTDEVKGRHFG 446 Query: 765 GPTIIYCPSRKMTQQVTGELRKLNLSCGTYHAGMSFSTRKDIHHRFVRDEIQCVIATIAF 824 GPTIIYC +++M V LR++ + YHAG++ + R+ H F+RD+I ++AT+AF Sbjct: 447 GPTIIYCQTKQMVDDVNCVLRRIGVRSAHYHAGLTKNQREKAHTDFMRDKITTIVATVAF 506 Domain IV . • Query: 825 GMGINKADIRQVIHYGAPKDMESYYQEIGRAGRDGL(^ SSCHVLWAPADINLNRHLL|rEI S C V WAP D+N + L GMGI+K D+R VIHYG P ++ESYYQEIGRAGRDG Sbjct: 507 GMGIDKPDVRNVIHYGCPNNIESYYQEIGRAGRDGS ifS ... Domain V -> Domain VI -> R 884 + ICRVFWAPKDLNTIKFKIJRNSQ 566 Query: 885 Sbjct: 567 NEKFRLYKLKMMAK-MEKYLHSSRCRRQIILSHFE DKQVQKASLGIMGTEKCCDNCRSRL + + + L MM + +E L + CRR +L HF+ + + + QKEEWENLTMMLRQLELVLTTVGCRRYQLLKHFDPSYAKPPTM-943 CCD C L QADCCDRCTEML 622 Query: 944 DHCYSMDDSEDTSWDFGPQAFKLLSAV-DILGffiKFGIGLPILFLRGSNSQlR-LADQYRRH 1001 DS D + + + + + K GIG PI FLRGS+ +: Sbjct: 623 N GNQDSSSSIVDVTTESKWLFQVINEMYN$KTGIGKPIEFLRGSSKE|DWRIKTTSQQ 679 Query: 1002 LFGTGKDQTESWWKAFSRQLITEGFLVEVSRYN-KFMKICALTKKGRNWL HKANTE 1057 LFG GK + WWKA + G+L EV Sbjct: 680 HLFGIGKHIPDKWWKALAASLRIAGYLGEVRLMQMKFGSCITLSELGERWLLTGKEMKID 739 KF L++ G WL 141 Figure 37. BLAST2 Pairwise Comparison of the WRN and Fl 8C5.2 Orthologs (cont'd) Query: 1058 SQSL1LQANEELCPKKFLLPSSKTVSSGTKEHCYNQVPVELSTEKKSNLEKLYSYKPCDX 1117 + ++LQ +E + +S + + S+ + ++P ++ K+ Y+P + Sbjct: 740 ATPILLQGKKEKAAPSTVPGASRSQSTKSS TEIPTKI LGANKIREYEPAN- 789 Query: 1118 XXXXXXXXXXXXMVQSPEKAYSSSQPVISAQEQETQIVLYGKLVEARQKHANKMDVPPAI 1177 + + + Q V E+ Q L +L + R AN +V P Sbjct: 790 ENEQLMNLKKQEVTGLPEKIDQ- -LRSRLDDIRVGIANMHEVAPFQ 833 Query: 1178 Sbjct: 834 LATNKILVDMAKMRPTTVENVKRIDGVS + +N +L A +RPT+ N++ IDG+S + K+ EGKAAMLAPLLEVIKHFCQTNSVQTDLFSST 12 3 6 ++ + F + + T++ ++ IVSNTVLDCFANLRPTSASNLEMIDGMSAQQKsfeYGKRFVTjCWQFSKETGIATNVNAND 8 93 Query: 123 7 KPQEEQKTSLVAKNKICTLSQSMAITYSLFQEKKMPLKSIAESRILPLMTIGMHLSQAVK 12 96 E L++K + LS ++ Y+ + K +A +R + T+ +L+ AV+ Sbjct: 894 MIPPE LISKMQ-KVLSDAWRWTEHLISRSTAKEVATARGISEGTVYSYLAMAVE 948 Query: 1297 AGCPLDLERAGLT 1309 G PL L++ ++ Sbjct: 949 KGLPLHLDKLNVS 961 Figure 38. CLUSTALW Pairwise Comparison of the WRN and Fl 8C5.2 Orthologs gil2 802 08 MSEKKLETTAQQRKCPEWMNVQNKRCAVEERKACVRKSVFEDDLPFLEFTGSIVYSYDAS 60 gi0861396 gil28 02 08 DCSFLSEDISMSLSDGDWGFDMEWPPLYNRGKLGKVALIQLCVSESKCYLFHVSSMSVF 12 0 gi0861396 gil280208 PQGLKMLLENKAVKKAGVGIEGDQWKLLRDFDIKLKNFVELTDVANKKLKCTETWSLNSL 180 gi0861396 gil2802 08 VKHLLGKQLLKDKSIRCSNWSKFPLTEDQKLYAATDAYAGFIIYRNLEILDDTVQRFAIN 240 gi0861396 gil28 02 08 KEEEILLSDMNKQLTSISEEVMDLAKHLPHAFSKLENPRRVSILLKDISENLYSLRRMII 3 00 gi0861396 gil28 02 08 GSTNIETELRPSNNLNLLSFEDSTTGGVQQKQIREHEVLIHVEDETWDPTLDHLAKHDGE 360 gi08613 96 MISDDDDLPSTRPGSVNEELPETEPEDNDELPETEPESDSDKP 43 : :*. ..: .:.*. .**: * : : . * gil280208 DVLGNKVERKEDGFEDGVEDNKLKENMERACLMSLDITEHELQILEQQSQEEYLSDIAYK 420 gi0861396 TVTSNKTENQVADEDYDSFDDFVPSQTHTASKIPVKNKRAKKCTVESDSSS--SDDSDQG 101 * * * * . . * . . . * . . . . * . * * 142 Figure 38. CLUSTALW Pairwise Comparison ofthe WRN and F18C5.2 Orthologs (cont'd) gil2802 08 STEHLSPNDNENDTSYVIESDEDLEMEMLKHLSPNDNENDTSYVIESDEDLEMEMLKSLE 480 gi0861396 DDCEFIPACDETQEVPKIKRG YTLRTRASVKNKCDDSWDDGIDE--EDVSKRSED 154 . : * : * . : * : . * : : . . * : * * : ** * : * : gil280208 NLNSGTVEPTHSKCLKMERNLGLPTKEEEEDDE NEANEGEEDDDKDFLWP APNE 534 gi08613 96 TLNDSFVDP-EFMDSVLDNQLTIKGKKQFLDDGEFFTDRNVPQIDEATKMKWASMTSPPQ 213 gil280208 gi0861396 EQVTCLKMYFGHSSFKPVQWKVIHSVLpERRDNVAVMATGYGKSLCFQYPPVYVGKIGLV 594 E A L N A L N E F F G H K G F R E K Q W D W R N V L i -GK-DQFVLMSTGYGKSVCYQLPSLLLNSMTVV 272 : * ; . . : * : * * * * * * : * ; * * . ; : . . : :* 4- D o m a i n I gil280208 ISPLISLMEDQVLQLKMSNIPACFLG--SAQSENVLTDIKLGKYRIVYVTPEYCSGNMGL 652 gi08613 96 VSPLISLMNDQVTTLVSKGIDAVKLDGHSTQIEWDQVANNMHRIRFIYMSPEMVTSQKGL 3 32 . ******* .*** * * * * _ * :* * . :: : * : : * : : * * : . : ** D o m a i n l a gil280208 LQQLEADIGITLIAVDEAHCISEWGHDFRDSFRKLGSLK--TALPMVPIVALTATASSSI 710 gi08613 96 ELLTSCRKHISLIiAIDEAHCVSQWGHDFRNSYRHLAEIRNRSDLCNIPMIALTATATVRV 3 92 * . * . * . * * * * * . * . * * * * * * . * . * . * _ . . . * . * . . * * * * * * . . ^- D o m a i n II -> 4- D o m a i n III -> gil2802 08 REDIVRCLNLRNPQITCTGFDRPNLYLEVRRKTGNILQDLQPFLVKTSSH-WEFEGPTII 769 gi08613 96 RDDVIANLRLRKPLITTTSFDRKNLYISVHSSK-DMAEDLGLFMKTDEVKGRHFGGPTII 451 * * * . * ** * *** *** . * * * * * * <r ... gil2 802 08 YCPSRKMTQQVTGELRKLNLSCGTYHAGMSFSTRKDIHHRFVRDEIQCVIATIAFGMGIN 82 9 gi0861396 YCQTKQMVDDVNCVLRRIGVRSAHYHAGLTKNQREKAHTDFMRDKITTIVATVAFGMGID 511 ** : : : * . : : * . * * : : . : . . ****; ; . * : . * * : * * : * . . * * . * * * * * * . ... D o m a i n IV -> 4- D o m a i n V -> gil280208 KADIRQVIHYGAPKDMESYYQEIGRAGRDGLCSSCHVLWAPADIISLNRHLLTEIRN-EKF 888 gi08613 96 KPDVRNVIHYGCPNNIESYYQEIGRAGRDGSESICRVFWAPKDLKTIKFKLRNSQQKEEV 571 *m*.*.*****m* . . . ************** * * . * . * * * * : * : . * : :: * : . <- D o m a i n VI -> gil280208 gi0861396 RLYKLKlMMAKMEKYLHSSRCRRQIILSHFEDKQVQKASLGIMGTEkCCDNCRSRLDHCYS 948 VENLTMMLRQLELVLTTVGCRRYQLLKHFDPSYAKPPTM Q- -ApCCDRCTEMLNCBNQ- 626 gi12 8 0 2 0 8 MDDSEDTSWDFGPQAFKLLSAVDILG-EKFGIGLPILFLRGSNSQ-RLADQYRRHSLFGT gi08613 96 --DSSSSIVDVTTESKWLFQVINEMYNGKTGIGKPIEFLRGSEKEDWRIKTTS2QKLFGI 1006 684 * *** ** * * * * * gil280208 gi0861396 GKDQTESWWKAFSRQLITEGFLVEV3RYNKFMKICA-LTKKGRNWLHKANTESQSLILQA 106 5 GKHIPDKWWKALAASLRIAGYLGEV *LMQMKFGSCITLSELGERWLLTG KEMKIDA 74 0 * . * ** gil28.02 08 NEELCPKKFLLPSSKTVSSGTKEHCYNQVPVELSTEKKSNLEKLYSYKPCDKISSGSNIS 112 5 gi08613 96 TPILLQGKKEKAAPSTVPGASRSQS-TKSSTEIPT-KILGANKIREYEPANENEQLMNLK 798 143 Figure 38. CLUSTALW Pairwise Comparison of the WRN and F18C5.2 Orthologs (cont'd) g i l 2 8 0 2 0 8 K K S I M V Q S P E K A Y S S S Q P V I S A Q E Q E T Q I V L Y G K L V E A R Q K H A N K M D V P P A I L A T N K I L V 1 1 8 5 g i 0 8 6 1 3 96 K Q - E V T G L P E K I D Q L R S R L D D I R V G I A N M H E V A P F Q I V S N T V L D 8 4 1 g i l 2 8 0 2 0 8 g i 0 8 6 1 3 9 6 D M A K M R P T T V E N V K R I D G V S E G - K A A M L A P L L E V I K H F C Q T N S V Q T D L F S S T K P Q E E Q K T 1 2 4 4 [ P P E 898 q F A N L R P T S A S N L E M I D G M S A Q Q K S R Y G K R F V D C W Q F S K E T G I A T N V N A N D M I I g i l 2 8 0 2 08 S L V A K N K I C T L S Q S M A I T Y S L F Q E K K M P L K S I A E S R I L P L M T I G M H L S Q i W K A G C P L D L E g i 0 8 6 1 3 96 - - L I S K M Q K V L S D A V R R V Y T E H L I S R S T A K E V A T A R G I S E G T V Y S Y L A M ^ V E K G L P L H L D * * • * * * * . 1 3 0 4 956 g i l 2 8 02 08 R A G L T P E V Q K I I A D V I R N P P - V N S D M S K I S L I R M L V P E N I D T Y L I H M A I E I L K H G P D S G - 13 62 g i 0 8 6 1 3 9 6 K L J W S R K N I A M A L N A V R W L G S N V A V L T P W V E A M G W P D F N Q L K L I R A I L I Y E Y G L D T S E 1 0 1 6 g i l 2 802 08 L Q P S C D V N K R R C F P G S E E I C S S S K R S K E E V G I N T E T S S A E R K R R L P V W F A K G S D T S K K L M 1 4 2 2 g i 0 8 6 1 3 9 6 N Q E K P D I Q S M P S T S - - N P S T I K T V P S T P S S S L R A P P L K K F K L 1 0 5 6 g i l 2 8 0 2 0 8 D K T K R G G L F S 1432 g i 0 8 6 1 3 9 6 Figure 39. CLUSTALW Pairwise Comparison of the WRN and E03 A3.2 Genes g i l 2 8 0 2 0 8 M S E K K L E T T A Q Q R K C P E W M N V Q N K R C A V E E R K A C V R K S V F E D D L P F L E F T G S I V Y S Y D A S 6 0 g i l 0 6 6 9 2 0 g i l 2 8 0 2 08 D C S F L S E D I S M S L S D G D W G F D M E W P P L Y N R G K L G K V A L I Q L C V S E S K C Y L F H V S S M S V F 12 0 g i l 0 6 6 9 2 0 g i 1 2 8 0 2 0 8 P Q G L K M L L E N K A V K K A G V G I E G D Q W K L L R D F D I K L K N F V E L T D V A N K K L K C T E T W S L N S L 180 g i l 0 6 6 9 2 0 g i l 2 802 08 V K H L L G K Q L L K D K S I R C S N W S K F P L T E D Q K L Y A A T D A Y A G F I I Y R N L E I L D D T V Q R F A I N 2 4 0 g i l 0 6 6 9 2 0 g i l 2 8 0 2 0 8 K E E E I L L S D M N K Q L T S I S E E V M D L A K H L P H A F S K L E N P R R V S I L L K D I S E N L Y S L R R M I I 3 00 g i l 0 6 6 9 2 0 g i l 2 8 0 2 0 8 G S T N I E T E L R P S N N L N L L S F E D S T T G G V Q Q K Q I R E H E V L I H V E D E T W D P T L D H L A K H D G E 360 g i l 0 6 6 9 2 0 M T E I Q F S E M S S G K R S L E T I T I D D S D E E T D K E P A A K K P S N Y T A W 43 * . . * . 144 Figure 39. CLUSTALW Pairwise Comparison ofthe WRN and E03A3.2 Genes (cont'd) gil280208 DVLGNKVERKEDGFEDGVEDNKLKENMERACLMSLDITEHELQILEQQSQEEYLSDIAYK 420 gil066920 AEIFNKK-KTESSAVSAIEKAKKDSEKERQRQK LREAIMERKRREALMNEPKKQ 96 . * * :.*.. . . : * . * . . : * * . *:*:::* :.: : gil2 802 08 STEHLSPNDNENDTSYVIESDEDLEMEMLKHLSPNDNENDTSYVIESDEDLEMEMLKSLE 48 0 gil066920 KIEPVS TVKLEKSVNQKISANSEDFDVSGPSNSRE 131 * : * . * * . : : : : * . * . : : * . * : * * gil28 02 08 NLNSGTVEPTHSKCLKMERNLGLPTKEEEEDDENEANEGEEDDDKDFLWPAPNEEQVTCL 540 gil066920 -TQESPLDPNDFVANPLAIGTGERLIR GQDIIERRDK VFL 170 * * . . . : . * . : * * . . : : * * * . * gil280208 KMYFGHSSFKPVQWKVIHSVLEERRDNVAVMATGYGKSLCFQYPPVYVGKIGLVISPLIS 600 gil06692 0 ELFCHKKYRSRLQMQAINCILKRKCDVYVSLPTGAGKSLCYQLPAWHGGITWISPLIA 23 0 . . . _ . * : . * : . : * : . : * . : . * * *****;* * . * * * : ****** : ^- Domain I ^- Domain la... gil2 802 08 LMEDQVLQLKMSNIPACFLGSAQSEN VLTDIKLG--KYRIVYVTPEYCS--GNMGL 652 gil066 920 LMKDQISSLKRKGIPCETLNSTLTTVERSRIMGELAKEKPTIRMLYLTAEGVATDGTKKL 2 90 * * . * * . _ ** . . * * . * . * : : :: :: . * : : * : * . * : *. * ... Domain la (cont'd) gil280208 LQQLEADIGITLIAVDEAHCISEWGHDFRDSFRKLGSLKTALPMVPIVALTATASSSIRE 712 g i 10 6 6 92 0 LNGLANRDVLRYIWDEAHCVTQWGHDFRPDYLTLGSLRDVCPGVPWVALTATANAKAQD 350 * . * . * _ * * * * * * . . . * * * * * * _****. _ * ** * * * * * * * . _ . . <- Domain II ^- Domain III ... gil280208 DIVRCLNLRNPQITCTGFDRPNLY LEVRRKTGNILQDLQPFLVKTSSH 760 gil066 920 DIAFQLKLRNPESFKSGTYRDNLFYDNHMASFITKCLTVDAKTSSSNLTKHEKAERSQNK 410 ** _ * . * * * * . . * * ** . * * * * . . : : : . . : "> gil280208 WEFEGPTIIYCPSRKMTQQVTGELRKLNLSCGTYHAGMSFSTRKDIHHRFVRDEIQCVIA 820 gil066920 KTFTGSAIVYCRSRNECGQVAKMLEIAGIPAMAYHAGLGKKDRNEVQEKWMNNEIPVVAA 470 * * . * . * * ** . ** . * _ :**** : . . * : : : : . : : : . : * * * * <- Domain IV -> gil2 802 08 TIAFGMGINKADIRQVIHYGAPKDMESYYQEIGRAGRDGLQSSCHVLWAPADINLNRHLL 880 gil066920 TVAFGMGIDKPDVRAVIHWSPSQNLAGYYQEAGRAGRDGKRSYCRIYYSKQDKNALNFLV 53 0 * . * * * * * * . * * . * . **** ******* . * * . . . . * * . . * : <- Domain V -> 4r Domain VI -> gil2802 08 TEIRNEKFRLYKLKMMAKMEKYLHSSRCRRQIILSHFE DKQVQKASLGIMGTEKCCD 93 7 gil066920 SGEIuA-KLREKAKKmAEGEKAEMQIKSIQTGLAKMLEYCESARCRHVSIASFFDDTECR 589 : * :* * * : * * . : . : : . : * . : : : . * : . : : . * gil28 02 08 NCRSRLDHCYSMDDSEDTSWDFGPQAFKLLSAVDILGEKFGIGLPILFLRGSNSQRLADQ 997 gil066920 PCKTNCDYCRDP TKTIRNVEAFINSEASTGRSMFRKSASS GE 631 gil280208 YRRHSLFGTGKDQTESWWKAFSRQLITEGFLVEVSRYNKFMKICALTKKGRNWLHKANTE 1057 gil066920 SGFDSVYGGGK RGGETEDELLSAASTSKDAMDRMEQEEAKRVRSVISQE 680 * . . * * * * * * * . . * . . . * 145 Figure 39. CLUSTALW Pairwise Comparison of the WRN and E03A3.2 Genes (cont'd) g i l 2 8 0 2 08 S Q S L I L Q A N E E L C P K K F L L P S S K T V S S G T K E H C Y N Q V P V E L S T E K K S N L E K L Y S Y K P C D K 1 1 1 7 g i l 0 6 6 9 2 0 F A K R R Q A A P P P R A T A R R V E P A T D V N V I K P E Q N V I K N V T L E T R E N W V R F L H R A L D S N W 737 g i l 2 8 0 2 08 I S S G S N I S K K S I I W Q S P E K A Y S S S Q P V I S A Q E Q E T Q I V L Y G K L V E A R Q K H A N K M D V P P A I 117 7 g i l 0 6 6 9 2 0 I V S G P P A G V T T K Q C A E Q L E Y G L Y S I S K N E T T Y K N K C G H K L A E I K K L T L K N A P F A Y 792 g i l 2 8 0 2 0 8 L A T N K I L V I J M A K M R P T T V E N V K R I D G V S E G K A A M L A P L L E V I K H F C Q T N S V Q T D L F S S T K 123 7 g i 1 0 6 6 9 2 0 T N T E V E Q N G F V K A S N L S 80 9 g i l 2 8 0 2 08 P Q E E Q K T S L V A K N K I C T L S Q S M A I T Y S L F Q E K K M P L K S I A E S R I L P L M T I G M H L S Q A V K A 12 97 g i l 0 6 6 9 2 0 g i l 2 802 08 G C P L D L E R A G L T P E V Q K I I A D V I R N P P V N S D M S K I S L I R M L V P E N I D T Y L I H M A I E I L K H 1 3 5 7 g i l 0 6 6 9 2 0 g i l 2 8 02 08 G P D S G L Q P S C D V N K R R C F P G S E E I C S S S K R S K E E V G I N T E T S S A E R K R R L P V W F A K G S D T 1 4 1 7 g i l 0 6 6 9 2 0 - -g i l 2 8 0 2 0 8 S K K L M D K T K R G G L F S 1432 g i l 0 6 6 9 2 0 146 Figure 40. CLUSTALW of Vertebrate WRN Orthologs g i l 2 802 08 M S E K K L E T T A Q Q R K C P E W M N V Q N K - R C A V E E R K A C V R K S V F E D D L P F L E F T G S I V Y S Y D A 59 g i 2 1 3 0 9 7 3 M E T T S L Q R K F P E W M S M Q S Q - R C A T E E K - A C V Q K S V L E D N L P F L E F P G S I V Y S Y E A 53 g i 3 4 2 02 90 M T S L Q R K L P E W M S V K Q Q E D R I D D A K K S F C K K N I L E D N L P F M K F N G S I V Y S Y E S 53 * . *** * * * * . . . . . . . * . . * * . * * * . . * * * * * * * * . . g i l 2 8 0 2 08 S D C S F L S E D I S M S L S D G D W G F D M E W P P L Y N R G K L G K V A L I Q L C V S E S K C Y L F H V S S M S V 1 1 9 g i 2 1 3 0 9 7 3 S D C S F L S E D I S M R L S D G D W G F D M E W P P I Y K P G K R S R V A V I Q L C V S E N K C Y L F H I S S M S V 113 g i 3 4 2 0 2 90 N D C S L L S E D I R S S L L E E D V L G F D I E W P P V Y T K G K T G K V A L I Q V C V S E K K C Y L F H I S P M A G 113 * * * . * * * * * * . * * . * * * . * * * * . * _ * * _ . * * . * * . * * * * _ * * * * * * . * _ * . 4- RNase D domain g i l 2 8 0 2 0 8 F P Q G L K M L L E N K A V K K A G V G I E G D Q W K L L R D F D I K L K N F V E L T D V A N K K L K C T E T W S L N S 179 g i 2 1 3 0 9 7 3 F P Q G L K M L L E N K S I K K A G V G I E G D Q W K L L R D F D V K L E S F V E L T D V A N E K L K C A E T W S L N G 173 g i 3 4 2 02 90 F P K G L K R L L E D E S V R K V G V G I E G D Q W K L M S D Y E L K L K G F I E L S E M A N Q K L R C K E K W T F N G 173 * * . * * * * * * . . . . . * * * * * * * * * * * * . * . . . * * . * . * * . . . * * . * * . * * _ * . . * _ RNase D domain (cont'd) g i l 2 8 0 2 0 8 L V K H L L G K Q L L K D K S I R C S N W S K F P L T E D Q K L Y A A T D A Y A G F I I Y R N L E I L D D T - V Q R F A 238 g i 2 1 3 0 9 7 3 L V K H V L G K Q L L K D K S I R C S N W S N F P L T E D Q K L Y A A T D A Y A G L I I Y Q K L G N L G D T - V Q V F A 232 g i 3 4 2 02 90 L I K H L F K E Q L Y K R K S Y R C S N W D I F L L T E D Q K L Y A A T D A Y A G L L I Y K K L E G M D A H E S D S F R 233 * . * * . . . * * * * * * * * • * _ * * * * * * * * * * * * * * * * * . . * * . . * : > . * RNase D domain (cont'd) -> g i l 2 8 02 08 I N K E E E I L L S D M N K Q L T S I S E E V T y i D L A K H L P H A F S K L E N P R R V S I L L K D I S E N L Y S L R R M 2 98 g i 2 1 3 0 9 7 3 L N K A E E N L P L E M K K Q L N L I S E E M R D L A N R F P V T C R N L E T L Q R V P V I L K S I S E N L C S L R K V 2 92 g i 3 4 2 0 2 9 0 V G R E G V A D C K G V K R Q L T D L S K G L M D L V N Q V P N S F G C Y T E A V R A V D I L E D L S E K L E E L R N I 2 93 g i l 2 802 08 I I G S T N I E T E L R P S N N L N L L S F E D S T T G G V Q Q K Q I R E H E V L I H V E D E T W D P T L D H L A K H D 3 58 g i 2 1 3 0 9 7 3 I C G P T N T E T R L K P G S S F N L L S S E D S A A A G E K E K Q I G K H S T F A K I K E E P W D P E L D S L V K Q E 3 52 g i 3 4 2 0 2 9 0 M K E A S K A E G N G L H F Q N S E D C S K K D K S I L H V A C K E S L A E H K 333 g i l 2 8 0 2 08 G E D V L G N K V E R K E D G F E D G V E D N K L K E N M E R A C L M S L D I T E H E L Q I L E Q Q S Q E E Y L S D I A 4 1 8 g i 2 1 3 0 9 7 3 E V D V F R N Q V K Q E K G E S E N E I E D N L L R E D M E R T C V I P - S I S E N E L Q D L E Q Q A K E E K Y N D V S 4 1 1 g i 3 4 2 0 2 9 0 - M D C K N A D S Q N N K D I D S C Q N E N R D E D F F M T L G I S E E E L Y M M E R E D - D K K Q T N P D 385 g i l 2 8 0 2 08 Y K S T E H L S P N D N E N D T S Y V I E S D E D L E M E M L K H L S g i 2 1 3 0 9 7 3 H Q L S E H L g i 3 4 2 0 2 9 0 Y K L N K E N D N E N D T S Y V I E S D E D L E M E M L K S S F N D D E N D S S Y I I E S D E D L E M E M L K S D S C E T N E E K D M S Y V I E S D E D F D S E I I K S 4 7 8 444 4 1 8 ** . ****** g i l 2 8 0 2 0 8 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 L E N L N S 3 T V E P T H S K C L K M E R N L G L P T K E E E E D D E N E A N E G E E D D D K - - D F L W P A P N E E Q 536 L E N L N S D M V E P T H S K W L E M G T N G C L P P - E E E D G H G N E A I K - E E Q E E E - - D H L L P E P N A K Q 500 L E D L D N S T E E A L G T G V P Q A G L I P A K S V D T V A D E E E D E G I E - E E D D D D D W D P S M P E P S A Q H 4 7 7 147 Figure 40. CLUSTALW of Vertebrate WRN Orthologs (cont'd) g i l 2 8 0 2 08 V T C L K M Y F G H S S F K P V Q W K V I H S V L E E R R D N V A V M A T G Y G K S L C F Q Y P P V Y V G K I G L V I S 596 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 g i l 2 8 0 2 0 8 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 I N C L K T Y F G H S S F K P V Q W K V I H S V L E E R R D N W V M A T G Y G K S L C F Q Y P P V Y T G K I G I V I S 560 I S C L K T Y F G H S S F K P V Q W K V V H S V L R E R R D N L V V M A T G Y G K S L C Y Q F A P V Y T S G I G I V I C 53 7 _ * * * * * * * * * * * * * * * * * . * * * * _ * * * * * . * * * * * * * * * * * . * . _***__ * * . * * <- D o m a i n I -> <- D o m a i n l a P L I S L M E D Q V L Q L K M S N I P A C F L G S A Q S E N V L T D I K L G K Y R I V Y V T P E Y C S G N M G L L Q Q L 656 P L I S L M E D Q V L Q L E L S N V P A C L L G S A Q S K N I L G D V K L G K Y R V I Y I T P E F C S G N L D L L Q K L 62 0 P L I S L M E D Q V L Q L E M S N I S S C F L G S A Q S K N V L Q D V K D G K M R V I Y M T P E F C S R G I S L L Q D L 597 * * * * * * * * * * * * * . . * . * * * * * * . • * * * • * * g i l 2 8 0 2 08 E A D I G I T L I A V D E A H C I S E W G H D F R D S F R K L G S L K T A L P M V P I V A L T A T A S S S I R E D I V R 716 g i 2 1 3 0 9 7 3 D S S I G I T L I A V D E A H C I S E W G H D F R S S F R M L G S L K T A L P L V P V I A L S A T A S S S I R E D I I S 680 g i 3 4 2 0 2 90 D N R Y G I T L I A I D E A H C I S E W G H D F R S A Y R S L G S L K R M L P N V P I V A L T A T A S P S I R E D I T K 657 * * * * * * . * * * * * * * * * * * * * * . . * * * * * * * * * * . • * * • * * * * * * * * * * D o m a i n I I -> D o m a i n I I I -> g i l 2 8 02 08 C L N L R N P Q I T C T G F D R P N L Y L E V R R K T G N I L Q D L Q P F L V - - K T S S H W E F E G P T I I Y C P S R 774 g i 2 1 3 0 9 7 3 C L N L K D P Q I T C T G F D R P N L Y L E V G R K T G N I L Q D L K P F L V R - K A S S A W E F E G P T I I Y C P S R 73 9 g i 3 4 2 0 2 90 S L N L H N P Q V T C T S F D R P N L Y L D V A R K T T N I S I D L R Q F L I K K Q Q G S G W E F E G A T I V Y C P T R 717 * * * . . * * . * * * * * * * * * * * . * * * * * * * * . * * : : * * * * * * * * . * * * . * 4- D o m a i n I V -> g i l 2 8 0 2 0 8 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 K M T Q Q V T G E L R K L N L S C G T Y H A G M S F S T R K D I H H R F V R D E I Q C V I A T I A F G M G I N K A D I R 834 K M T E Q V T A E L G K L N L A C R T Y H A G M K I S E R K D V H H R F L R D E I Q C W A T V A F G M G I N K A D I R 7 9 9 K T S E Q V T A E L I K L G I A C G T Y H A G M G I K Q R R E V H H R F M R D E I H C V V A T V A F G M G I N K P D I R 777 * . . * * * _ * * * * _ . . * * * * * * * * . . . * * * * . * * * * . * * . * * . * * * * * * * * * * * 4- D o m a i n V -> g i l 2 8 0 2 0 8 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 Q V I H Y G A P K D M E S Y Y Q E I G R A G R D G L C S S C H V L W A P A D I N L N R H L L T E I R N E K F R L Y K L K Q V I H Y G A P K E M E S Y Y Q E I G R A G R D G L C S S C H L L W A P A D F N T S R N L L I E I H D E K F R L Y K L K K V I H Y G A P K E M E S Y Y Q E I G R A G R D G L I S C C H A L W A Q A D M N F N R H M L G E I P N K G F R E Y K L K 894 859 837 . * * * * * * * * . * * * * * * * * * * * * * * * * * * * * * * * * . * . * ; ; * * * ; : * * * * * * D o m a i n V I -> g i l 2 8 0 2 0 8 g i 2 1 3 0 9 7 3 g i 3 4 2 0 2 9 0 M M A K M E K Y L H S S R C R R Q I I L S H F E D K Q V Q K A S L G I M G T E K C C D N C R S R I i D H C Y S M D D S E D 954 M M V K M E K Y L H S S Q C R R R I I L S H F E D K C L Q K A S L D I M G T E K C C D N C R P R L N H C L T A N N S E D 919 MLTKMEKYLNSSTCRRKIILSHFEDKQLRKASSGIMGTEKCCDNCKTR4lCNISINDTED 8 97 . . . * * *. ****** . ** *** .********* . . *** *********** . ** g i l 2 8 . 0 2 08 T S V i D F G P Q A F K L L S A V D I L G E K F G I G L P I L F L R G S N S Q F L A D Q Y R R H S L F G T G K D Q T E S W 1 0 1 4 g i 2 1 3 0 9 7 3 A S C D F G P Q A F Q L L S A V D I L Q E K F G I G I P I L F L R G S N S Q R L P D K Y R G H R L F G A G K E Q A E S W 979 g i 3 4 2 0 2 9 0 N L C D F G P Q A Y K F I S A V D V L G Q K F G T G V P V L F L R G S T S Q P V P D R F R N H S L F S S G K D Q T E A F 957 * * * * * * **** .* .*** * . * . * * * * * * ***. * . . * * ** . * * . * . * . g i l 2 8 0 2 0 8 W K A F S R Q L I T E G F L V E V S R Y N K F M K I C A L T K K G R N W L H K A N T E S Q - S L I L Q A N E E L C P K K 1 0 7 3 g i 2 1 3 0 9 7 3 W K T L S H H L I A E G F L V E V P K E N K Y I K T C S L T K K G R K W L G E A S L Q S P P S L L L Q A N E E M F P R K 103 9 g i 3 4 2 0 2 9 0 W K V L A R Q L I T E G Y L Q E S S G Q T K F S T I C G L T S K G S N W L I K A N N E Q C P S L L L P S N N E L C L Q R 1 0 1 7 * * . : : : : * * : * * : * * . . * : . * . * * . * * : * * : * . : . * * . * . * . * . . . 148 Figure 40. CLUSTALW of Vertebrate WRN Orthologs (cont'd) gil28 02 08 FLLP--SSKTVSSGTKEHCYNQVPVELSTEKK--SNLEKLYSYKPCDKISSGSNISKKSI 112 9 gi2130973 VLLP--S SNPVSPETTQHSSNQNPAGLTTKQ SNLERTHSYKVPEKVS SGTNIPKKSA 1094 gi3420290 TRVSNFS SAQAHSSMVPHAS SNTRS SMPKAGPEKMELKDKFSYQEAERLSKAAGVSKS S F 1077 gil280208 gi2130973 gi3420290 gil280208 gi2130973 gi3420290 MVQSPEKAYSSSQPVISAQEQETQIVLYGKLVEARQKHANKMDVPPAILATNKILVDMAK 118 9 VMPS PGTS S S PLE PAlSAQELDARTGLYARLVEARQKHANKMDVPPAILAANKVLLDMAK 1154 KLQTPCKLSRPPEPEVSPRERELQTTLYGRLWARQKIASERDILPAVLATNKVLVDMAK 113 7 . ** * * * * * * * . * * . * * . * . * * * * HRDC Domain MRPTTVENVKRIDGVSEGKAAMLAPLLEVIKHFCQTNSVQTDLFSSTKPQEEQKTSLVAK 124 9 MRPTTVENMKQIDGVSEGKAALLAPLVGVIKHFCQVTSVQTDLLS SAKPHKEQEKSQEME 1214 LRPTTSENMKKLDGVSEAKSAMLAPLLEWKEFCIANSLKVDVFSGSVSQSESTFFTPRE 1197 * * * * ** . .***** HRDC Domain gil28 02 08 NKICTLSQSMAITYSLFQEKKMPLKSIAESRILPLMTIGMHLSQAVKAGCPLDLERAGLT gi213 0973 KKDC SLPQSVAVTYTLFQEKKMPLHSIAENRLLPLTAVGMHLAQAVKAGYPLDMERAGLT gi3420290 QER|SLPESQRMSYSLFQEQNLSLKKIADVRCLSMAWGMHLWQALKAGYSFDVQRAGLT 1309 1274 1257 . * . * * * * . >**** * * . * * * gil280208 gi2130973 gi3420290 PEVQKIIADVIRNPPVNSDMSKISLIRMLVPENIDTYLIHMAIEILKHGPDSGLQPSCDV 1369 PETWKIIMDVIRNPPINSDMYKVKLIRMLVPENIDTYLIHMAIEILQSGSDSRTQPPCDS 1334 PEMKKLITYAIKKPPIWSDLSSFKAIREYV'PAI'ITDGYPIRMVISLLKKEGSSGAQGQPEF 1317 * • • * * . * * * • ** *** gil280208 NKRRCFPGSEE 1380 gi2130973 SRKRRFPSSAE 1345 gi34202 90 PTQKTLIQTEENPKNVSVQNTKHKVTMGKSMWIEKKPTQPATAELEVTKGKALAPIMLAS 13 77 gil280208 gi2130973 gi3420290 -ICSSSKRSKEEVGl -SCESCKESKE-WTtETKASSSESKRKLPEWFalKGNVPSADTGSSSSMAKTKKKG-LFS 1401 NTETS SAERKRRLPVWFflKG SDT-SKKLMDKTKRGG-LFS 1432 WNEASLDADTEELFSESQSSTTRPRRRLPEWFGSTKGNAATRCIQESKNLGEEKGSFFD 1436 149 3. HelicAceDb "HelicAceDb" is an ACEDB-based genomics database incorporating the WRN and other helicase-related information compiled or initially derived from comparative genomic analyses undertaken during this thesis project, including phylogenetic trees represented by the novel ACEDB "dendrogram" display. The database was compiled using the Microsoft Windows version of ACEDB. It is anticipated that a copy of this database will be made available in computer readable form on CDROM and/or via a permanent "HelicaseWeb" site. 4. HelicaseWeb In order to better integrate available information on WRN/DExH helicases in one place, and provide a novel resource for helicase research, an experimental WWW database site, "HelicaseWeb" was initiated during this thesis project. This experimental site is ultimately meant to be a repository of information about helicase gene families, in particular, the WRN related DEAH helicases. The site is modeled somewhat upon the "ProWeb" model of Henikoff, Endow and Greene (1996; The site will also eventually incorporate a ACEDB for Windows WWW interface to "HelicAceDb" - an ACEDB based genomics database incorporating the WRN and other helicase-related information compiled or initially derived from comparative genomic analyses undertaken during this thesis project, including phylogenetic trees represented by the novel ACEDB "dendrogram" display. This WWW client interface to ACEDB was not completely implemented during the thesis project. At the time of this writing, a permanent location on the WWW for the "HelicaseWeb" site has not been arranged. 150 IV. DISCUSSION A . Fine Scale Genomic M a p Construction Near W R N 1. Evolution of WRN Candidate Region Genomic Maps Subsequent to the initiation of this thesis project, several research groups published a steady stream of genetic and physical mapping papers characterizing the WRN candidate region. In 1995, Ye et al. published a microsatellite repeat marker, D8S1055, exhibiting no apparent recombination with WRN. This marker was experimentally employed in the construction of a long range physical map spanning the WRN region (Bruskiewich, Schertzer and Wood, 1997; this thesis). Three additional markers, LHRH (Bruskiewich et al, 1996; this thesis); D8S2297 (Bruskiewich, Schertzer and Wood, 1997; this thesis) and cos53C3PA (this thesis) were added to the repertoire of genetic markers in this region. By the time WRN was positionally cloned (Yu et al., 1996), a very large collection of additional polymorphic markers were characterized to span the WRN candidate region (Yu et al., 1996b; Goddard et al, 1996; Nakura et al, 1996). The WRN locus region is now well characterized by a dense series of genetic markers. The two markers formally published during this thesis project, LHRH and D8S2297, were demonstrated by linkage and physical mapping to be of limited utility in refining the WRN candidate region. Ironically, this limitation places them distal to the large published collection of WRN region markers; hence, they could remain useful markers in other genetic studies. The LHRH polymorphic marker, in particular, may be useful in studies on the role of this gene locus in human reproduction and disease (see below). Against this background of genetic linkage map refinement, physical map construction across the WRN region proceeded apace, both from genome wide efforts (Chumakov et al. 1995; 151 Hudson et al 1995) and efforts specifically focused upon the WRN region (Chaffanet et al, 1996; Imbert et al, 1996; Yu et al, 1996b). This thesis project constructed to a long range physical map spanning D8S339 (Bruskiewich, Schertzer and Wood, 1997; this thesis, see below) that correlates well with, and complements other published WRN region maps. 2. Linkage Mapping of LHRH Luteinizing hormone-releasing hormone (LHRH) is a key neuroendocrine molecule in the hypothalamic-pituitary-gonadal hormonal system controlling human reproduction. Impaired function of this hormone may underlie such reproductive phenotypes as hypogonadism and precocious puberty (Cattanach et al, 1977; Mason et al, 1986). At the start of this thesis project (circa December 1993), it was noted that Yang-Feng et al. (1985) assigned the LHRH locus to 8p21-pl 1.2 by in situ hybridization and somatic cell hybrid analysis using a cloned cDNA probe (Seeburg and Adelman, 1984; Adelman et al, 1986); however, a precise genetic and physical location for this locus was unknown. Some preliminary review of the literature on the Werner syndrome phenotype suggested that hypogonadism was a cardinal feature of a significant percentage of, but not all, cases of the disease. This suggested the possible hypothesis of a contiguous gene deletion as a mechanism for genetic mutation in WRN, a deletion also affecting the LHRH locus. In hindsight, this hypothesis was somewhat naive in its biological justification, and firmly refuted by the experimental determination of the genetic map position for LHRH, pinpointing the LHRH locus outside of the candidate region for WRN (i.e. distal to D8S137; Bruskiewich et al, 1996). Radiation hybrid mapping by Oshima et al. (1994) confirms this result. More careful reading of the literature (Imura et al, 1985) indicates that the levels of luteinizing hormone (LH) and follicle stimulating hormone (FSH) are in fact prematurely elevated in Japanese Werner syndrome patients with respect to matched controls, with LHRH 152 activity comparable to normal aged subjects. Thus, LHRH loss of function likely unrelated to the hypogonadism phenotype of WRN. Despite the failure of this hypothesis, the new LHRH marker provides a new highly polymorphic marker for clinical researchers in genetics studies of this locus. For example, considerable literature indicates that LHRH and its receptor are expressed locally in a variety of reproductive cancers (Emons et al., 1998; Yin et al., 1998) and that LHRH may inhibit tumor-cell proliferation in an autocrine/paracrine manner (Harris et al, 1991; Szende et al, 1991; Limonta et al, 1993; Irmer et al, 1994, 1995) by stimulating apoptosis via a ligand called "Fas" (Imai et al, 1998). It is interesting to note that a tumor suppressor inactivated in certain reproductive cancers is postulated to reside on chromosome 8p (Kerangueven et al, 1995). This thesis project provides a highly polymorphic marker with which to test the possible genetic role of LHRH in the development of such reproductive cancers, should familial susceptibility to such cancers be uncovered in specific pedigrees. 3. A Chromosome Wide Strategy for Simple Tandem Repeat Polymorphism (STRP) Isolation The second strategy in genetic map refinement undertaken in this thesis project was to bin cosmids from a whole chromosome 8 library into regional sub-libraries using inter-Alu products from various somatic and radiation hybrids, hybridized against high density filters of the cosmid library. The resulting cosmid sub-libraries were screened for candidate STRPs. A novel highly informative polymorphic STS, D8S2297 was developed via this strategy. D8S2297 ultimately did not appear to reside closer to WRN than D8S339; nevertheless, this marker of high heterozygosity may serve as a useful marker to help integrate the genetic and physical map in this region of chromosome 8p. Although the above Alu probe/hybrid protocol successfully yielded a marker in the target region, the protocol was somewhat costly and inefficient due to extensive chromosome 153 fragmentation in, and limited STS characterization of, the chosen hybrids. This appeared to result in significant false positive errors in physical map bin assignment of cosmids in the sublibrary. 4. A Multi-point Genetic Linkage Map Since LHRH and cos53C3PA appeared to show zero recombination to D8S5, a reasonable question was whether one could deduce the order of these three markers by combining linkage data, i.e. performing a multipoint analysis with all three loci. Such an analysis using CRIMAP is summarized in Table 13. A comprehensive multipoint analysis, against CEPH 7.1 reference map markers, of the three STRP markers completely characterized in this thesis project - LHRH, cos53C3 and D8S2297 - gives a slight LOD bias towards the map order: D8S136-D8S5- LHRH -cos53C3PA-D8S^ 37-D8S2297-D8S87-FGFR1 This thesis project did not generate any other specific experimental evidence to resolve the relative ordering of LHRH, cos53C3PA and D8S5. 5. A Long Range Physical Map This thesis project's STS content analysis combined with data published by others (Chumakov et al. 1995; Hudson et al. 1995; Chaffanet et al., 1996; Imbert et al., 1996; Yu et al., 1996b) supports a YAC contig order and orientation of 8pter - 844e2 - 807fll - 896f4 - 8cen. The long range restriction map of this contig constructed in this thesis provides hybridization based evidence for a locus order of 8pter - D8S137 - D8S2297 - D8S1055 - D8S339 - GTF2E2 -GSR - PPP2CB - D8S87 - 8cen. This map sets an upper limit upon the physical separation between D8S2297, D8S339 and D8S1055 by locating these markers within a single 1,400 kb YAC, 844e2; and D8S339, GTF2E2, GSR and PPP2CB on a second 1,140 kb YAC, 896f4, which has also been observed to 154 contain WRN (Yu et al, 1996a,b). Moreover, the distal edge ofthe GTF2E2, GSR and PPP2CB gene cluster is located only 220 kb proximal to D8S339. Given the putative chimerism of YAC 807fll, the distance separating D8S2297 and D8S1055 is indeterminate, given that D8S2297 is mapped on the basis of lying distal to the telomeric end of 807fl 1. Hybridization data (Table 17, Figure 23 and Figure 26) support the notion that the centromeric end of this YAC is a well preserved genomic fragment of chromosome 8p clearly overlapping the other YACs in the region. No such confirming data was obtained, though, for the telomeric end of 807f 11, thus not ruling out the possibility that this other end could be deleted for D8S2297 by recombination with a chromosome 20 genomic fragment. On this basis, D8S2297 and D8S1055 could be closer together than otherwise indicated on this map. 6. Candidate Genes in the Region One locus studied in this thesis project, GTF2E2, was initially discovered by Mike Schertzer of the Wood lab during a strategic screening of adjacent sequences flanking AscI restriction sites within the LA08NC01 cosmid library. The recognition sequence of this enzyme, 5'-GGCGCGCC-3', would be especially prevalent within CpG islands thought to commonly flank some classes of genes (Wicking and Williamson, 1991). GTF2E2 and the two other gene loci studied in this project, GSR and PPP2CB, were physically localized by others into the WRN region (Nakura et al, 1994; Imbert et al, 1996; Hudson et al, 1995). GTF2E2 (MIM# 189964) is the p34 (beta) subunit of the generalized transcription factor II E. The GSR locus (MIM# 138300) is glutathione reductase, an enzyme involved in the metabolism of glutathione. PPP2CB (MIM# 176916) is the beta isoform of the catalytic subunit of protein phosphatase 2A (PP2A), a signal transduction enzyme implicated in many metabolic and cell cycle processes. Some indications point to involvement of PP2A in interleukin-1 (IL1) signal transduction. IL1 is a cytokine involved in inflammatory processes. The receptor for IL1 155 is expressed in tissues affected in Werner Syndrome. PP2A may also modulate p53 phosphorylation, leading to a explanation for the WRN genomic instability phenotype. PPP2CB is developmentally regulated in a way which might have explained adolescent onset of the disease. Prior to the actual positional cloning of WRN reported in April 1996, PPP2CB represented a strong candidate for the WRN locus, on the basis of both its determined map position and biological nature. A long range map published (Bruskiewich, Schertzer and Wood, 1997), and further described in this thesis, refined the fine scale physical map positions of these three biologically interesting genes relative to WRN and STSs in the region. Positional cloning (Yu et al, 1996a,b) mapped the WRN locus to the proximal end of the clone 896f4, the mega-YAC characterized in some detail in this thesis. B. Positional Cloning of WRN 1. WRN, A Putative Helicase Exon trapping, cDNA selection, and extensive sequencing allowed identification, by the G. Schellenberg lab in Seattle, of a candidate DNA helicase. This positional cloning effort found four mutations in Werner syndrome patients that likely result in a loss of gene function due to truncations in the C terminal region of the protein (Yu et al, 1996). The 1432 amino acid protein predicted from the Werner syndrome gene cDNA sequence contains a region with a strong sequence similarity to the helicase motifs of the E. coli RecQ-like DNA helicase gene family (Yu et al, 1996). This gene family includes the gene for another human disease, Bloom syndrome, plus several genes characterized in model organisms such as yeast and Caenorhabditis elegans. All members of the RecQ family have seven well conserved consensus helicase motifs (Gorbalenya and Koonin, 1993), including the putative ATP-binding 156 domain in motif I and the DEXH sequence (where X may be any amino acid) in motif II that is thought to bind Mg and interact with the ATPase motif. The initial identification of the Werner syndrome protein as a putative helicase reveals little about its function, as it may be involved in DNA replication, repair, transcription, recombination, chromosome segregation, or any activity requiring unwinding of the DNA (Yu et al, 1996). It is also unclear why the phenotypes of Werner syndrome and Bloom syndrome are so distinct from each other, given the extent of sequence similarity of their genes. 2. Human Helicase Functions It is interesting to note the growing list of helicases being characterized at the heart of various genetic disorders (Table 22). There is a clear medical motivation to understand the diverse functions of this superfamily of genes. As a rule, the physiological roles of mammalian helicases remain poorly characterized with a handful of exceptions, largely because of the lack of genetic analysis. Biochemical purification has led to the isolation of human DNA helicases I to VI from HeLa cell nuclei, and DNA helicases A and E from HeLa cell cytosolic extracts (Matson et al, 1994). Two of these helicases have been cloned and identified as previously known proteins, the Ku autoantigen involved in double-strand break repair (Tuteja et al, 1994) and nucleolin, an RNA helicase required for pre-ribosome assembly (Tuteja et al, 1995). The basal transcription factor TFIIH has been shown to carry two contrasting helicase activities apparently due to its subunits XPB/ERCC3 and XPD/ERCC2, proteins that were previously known to be involved in a group of overlapping human DNA repair disorders xeroderma pigmentosum, Cockayne's syndrome, and trichothyodystrophy (Schaeffer et al, 1993; Sung et al, 1993; Schaeffer et al, 1994; Drapkin et al, 1994). This work was significant in that it appeared to bring together two separate biochemical pathways, nucleotide excision repair 157 (NEK) and RNA transcription. In addition, CSB/ERCC6, which corrects the repair defect of Cockayne's syndrome complementation group B, is a putative helicase by sequence similarity (Troelstra etal, 1992). Finally, the Bloom's syndrome gene has recently been shown to be a member of the RecQ helicase family by similarity (Ellis et al, 1995). In parallel with Werner syndrome, Bloom's syndrome also manifests a mutator phenotype consisting of increased chromosomal breakage and hyperrecombination at the cellular level and increased susceptibility to cancer at the clinical level. DNA repair appears to be intact in Bloom's and Werner syndromes, setting these apart from the transcription/nucleotide excision repair syndromes mentioned above. On the other hand, Bloom's syndrome patients suffer from a severe growth deficiency and immunodeficiency early on in life, do not ordinarily have clinical features of early aging, and develop many different types of neoplasia, without any predilection for mesenchymal tumors. Markedly elevated frequency of sister chromatid exchanges, peculiar chromosomal aberrations called quadriradial figures, prolonged DNA elongation rates, and abnormal replication intermediates have been documented (Chaganti et al, 197'4; Hand et al, 1975; Friedbergera/., 1995). 158 o C3 O .3 I 0 3 C3 > CN Q or o 0) cd E vo X ! O S O —c <u . .3 Q c > cy o % cr is M tn . . in -S "o c ca « 2 « a C3 T3 CO <u fl CD o a o cu xt1 X3 X ! 00 •a > o ~ o CS I— CU a C U 60 i 2 "oo o .3 g CN O N i vo re CN i O N O N CD B ON O ON 1 — cs u cu 3 ON ON CD co w B O X> >/i x> ON 5 2 ca vo > 2 CD 00 a 3 00 co "3 CN O ON >> as 3 ca Is 3 T3 .O a t i « 2 .3 § B 5 o i cu F * t eg tJ XJ co" CD W CO 00 co OS B ON o T3 CN <g PH erf on CN T3 ca Crf CN X w o pq O VO O pq o-w o H< P * —r co 2 .5 2 u 6 B 5 CD co 3 o cr S CD •a ca co 3 ca a CD C 3 "2 § " > 2 Z § § Q P. ~, co £ ^ s ^ co CD CD ca i- "J CD »* ca >- > . CD O • £ < 1 -a rB e •--2 - i a o -co ca •a CD CO C3 CD 5 a" •3 .2 •B -a -a B co B CD O co 6 -g 5 T3 ^ -a CD B CD w 2 23 ca ca fD ca O 3 CD i-S T3 CD X) .a ca T3 CD co ca CD o 3 C T3 "s i £ co C 5 ca co «> <^ -3 u ^ 3 > CD t-X) XI <: 3 o <B T3 3 X! &, " 3 o o CD J3 a CO 3 o •s < 8 Q CD Vj CD In " 1 a T3 O XI I.a ca co p | CJ 2 a o o. ca 3 O w P « ? .2 3 -a « w 2 S3 "S .2 " C D •3 2 & M „ x T ^ I f | ti £ ca o ca T 3 a Sgg . 8 .SP 5 •S i -3 co C S g J i S " E j§ 13 . ~ o -2 * .2 -a B T3 CD • O CD Q--2 2 —1 rt «! 'a co ^ a .3 ca 3 CD J3 t <D O o. o & cH • ca _J • •—• l"H .5 3 tn t>o ca o co" 2 S ca xi \M <% £-cj _ ca ^.2 -o G. o a CD C4_( CD <J 2 o X3 ca u 3 "2 ca 3 .a CD 'S 2 2 o ca a > X) <l O CB ,2 H ca •S •"B 'I s l> 00 _ r O "a CD y 'S .3 e ^ ss" I co g s >- a •2 -2 « ta erf -a "eel S CD Cfi * I e^  a o o B ,B CD 55 CD <S a. ' 1^  1 CD \n •a a, ca co a. 3 CD to CD CJ B B 2 _ co ca o ° CD a ^ ca > ^ B . B CD to S2 e CD SS CJ « ,2 J3 CD \E! a % " • H u co ca CO r-j 3 § CD CO t S O .2 a J U U B X T3 JB CD a CJ < 1 3 5 ^ -83 ca .3 B 3 " .2 £i> ta •3 "° ' e ?3 T3 § a o 'S CD a CD CO CD .B CD ."B co 3 § « — x ° _ o o o ^ 3 a O CD "3 3 o . „ CJ CO CD CO >-, CD -^cfi > ea "* CD .3 -° co g CD ^ Xi C M-i JB ca . -ta -a CD -T3 B ca B Q. CO J J " ca •ca CJ a o -CD a CD S - i - i . 3 CD ca CD TJ S CD > CU X CD Q T3 ca . ~ 3 < B w 7 3 53 Q o S 2 b o -a D. j- ca o 2 a. CJ -ff '3 co F3 c CJ CD .3 3 03 O , x) o 00 X) o ^ B o-•2 g ta 2 U a 3 co" IB x> o B CD 00 § X5 cj X CD "2 ta 3 o X! u o t O T3 CD II CO cd '•s s w o c/) ca o. CD B O o •3 a >> to CD 6 CD CD 3 o •3 CD 1 ca is o . ft 'to a CD .2 CD X I 13 a a. -4-» CD 1 3 3 CO 00 o o X3 alas ber: ndr < > 3 1—5 CO Q a _o •ta •*-» a CD a _CD "a, a g i n a CD £ f o X & H U CQ a o 3 B . » 2 a s a 53 a S, •o » a aj £P § X PL, U ca CD a o -3 a >^ CO CD a >. ca M o o O X3 o a CO T3 O O X I o •a o -K o ON m ON O m o © \o >n VO o VO •sf as c i o c i o _ vo 5 53 erf « xi pq XJ £ Z D. Bioinformatics Analysis of the DExH Gene Family 1. Gene Family Assignment and Functional Annotation The third phase of this thesis involved bioinformatics analysis of the WRN DNA helicase. This analysis reflected the following fundamental biological question: Based upon phylogeny and fine scale protein sequence analysis, to which gene (sub)family does a given gene (protein) sequence belong, and given its (sub)family assignment, what are its inferable functional characteristics? In particular, can one infer gene family groupings of genes and then identify highly conserved amino acid residues in multiple sequence alignments (MSA) which are (super/sub) family specific critical elements of protein structure and function? The gene family question relates directly to the semantic distinction between similarity, paralogy and orthology. Sequences are homologous if they share some extent of statistically significant sequence similarity. Sequences are paralogous if they coexist in the genome of a given organism and derive from some sequence (gene) duplication event in an ancestral organism. Paralogous loci are free, after the gene duplication event, to assume fully independent mutational and functional evolutionary paths, as long as at least one or the other gene retains all the critical functional features of the progenitor gene that are required as the organism evolves. Orthologous genes are genes in two distinct taxa which are separated in evolution solely by speciation events. Critical gene functions and structure are more likely to be conserved between orthologous genes, making them prime candidates for cross comparison of functional data between a model organism and the organism of interest (e.g. humans). The practical exploration of the gene family and ortholog identification problem was undertaken in this thesis project as an attempt to functionally annotate the WRN gene family of DExH/D helicase genes. This activity initially involved the compilation and integration of existing WWW accessible database information pertaining to the helicase genes. This was 161 followed by the application of bioinformatic tools to analyse the data and to -construct a molecular phylogenetic tree derived from a MSA, in order to ascertain gene (sub)family membership. An implicit objective in this phase of the project was to integrate analysis and representation of data with specific software tools developed during my doctoral thesis project: ACEDB for Microsoft Windows and a novel ACEDB dendrogram display for representing comparative genomic/phylogenetic data. 2. Approaches to Assessing Gene Family Membership The initial step most biologists take to discover the functional nature of a novel gene sequence is to search a sequence database. For this task, the unknown gene (DNA or amino acid sequence) is given as a query input for a suitable search program, most commonly FASTA (Pearson and Lipman, 1988) or BLAST (Altschul et al. 1990, 1997). Such database searches provide a quick return in assessing whether or not the novel gene resembles any other gene known. Often, this is enough to provide major insights into putative gene function. In most other situations, one is simply able to identify conserved functional domains which classify a novel gene into some broad category of biological activity (e.g. kinase, helicase, transporter, receptor) and further detailed analysis is required. BLAST results may be collected across diverse taxa in order to define ortholog mappings, as is done, for example, in the Clusters of Orthologous Groups approach, which uses cluster analysis of pairwise similarity scores ("COGS"; Tatusov, Koonin and Lipman, 1997) or in XrefDb (Bassett et al. 1997). These approaches, however, appear to lack precision and often group genes solely on the basis of anonymous sequence homologies which don't, a priori, give a precise indication of specific gene function. Also, such analyses tend to be quite coarse in their gene (sub)family assignments, ignoring the fine structure of orthology and function within multi-paralog gene families. 162 An alternative approach is to focus upon the structural character of conserved sequence motifs. The most conservative approach is to characterize motifs with an experimentally defined biological function (e.g. active sites for biochemical activities) by expert database curation (PROSITE, Bairoch, Bucher and Hofmann, 1997). One can take such analysis one step further by classifying conserved subsequences by their presence in modules composed of a set of ordered conserved sequence motifs within a context of a well characterized genes (BLOCKS, Henikoff and Henikoff, 1994; PRINTS, Attwood et al, 1998). Although, in this latter approach, some motifs in the module set may correlate back to experimentally determined biological activities, it is not necessary that all such motifs have assigned function. Rather, the utility of maintaining such a database of motifs stems from the high sequence conservation observed within a module or structural context shared by many genes across taxa. Such genes may themselves not yet be fully characterized functionally and may be otherwise somewhat diverged in sequence identity outside the conserved motifs. From such databases, one may construct MSA's of genes sharing the motifs, modules or domains, then define empirical profiles of amino acid residue probabilities at each position in the MSA, for use in searching sequence databases for other genes exhibiting similarly conserved sequences. The bioinformatics portion of this thesis project explored the use of some of the aforementioned tools in the characterization of the WRN/DExH DNA helicase family; however, one fundamental limitation of most such analysis is that these systems generally succeed only in fitting new genes into an existing framework of curated gene classifications, but rarely generate new hypotheses about the specific (sub)family membership status of a new gene, including the possible modification of "accepted" gene family classifications by the splitting or merging of (sub)families, or the definition of new (sub)families. In most cases, new classifications must be 163 manually curated into the databases based upon expert scientific assessment of data inconsistencies or failures to properly classify a specific novel gene sequence. 3. Dynamic Modeling of Gene Family Membership Some efforts have been made to partially automate the process of characterizing gene families by developing new stochastic paradigms of sequence analysis. One such method is Hidden Markov Models (HMM; Krogh et al, 1994; PFAM, Sonnhammer, Eddy and Durbin, 1997). A second such method is Bayesian tree estimation (Bete; Sjolander, 1998a,b). Both of these methods share the common characteristic of employing probabilistic techniques, such as Dirichlet Mixture Priors, to a training set of MS As, in order to reveal the extent of amino residue conservation at specific locations in the MSA. Such information can then be used to infer, or test predictions of, gene (super/sub)family membership of other sequences. The relative power of these methods is that they can take a collection of sequences for which moderate evidence of sequence relatedness has been established - say, by BLAST similarity searches - and proceed to organize the dataset based upon a more detailed statistical analysis. The output may be a probabilistic computational model (for HMMs) or a phylogenetic tree with associated residue conservation statistics (for Bete). The relative limitation of these techniques is that they both require a reasonably good quality MSA as input data. Thus the statistics output from these methods is only as good as the MSA. For this thesis project, the kind collaboration of Dr. Kimmen Sjolander was obtained, to apply Bete analysis to a collection of DExD/H helicases sequences. 4. ClustalW The multiple sequence alignments and basic phylogenetic tree building program selected for use in this thesis project was ClustalW (Thompson, Higgins and Gibson, 1994; Higgins Thompson, and Gibson, 1996). ClustalW is a public domain program which constructs multiple 164 sequence alignments by pairwise comparison of sequences, followed by a nearest neighbour tree merge of sequences. Users have the choice of using dynamic programming (Smith and Waterman, 1981) or the method of Wilbur and Lipman (1983) to perform the pairwise comparison. The program also implements an algorithm for bootstrap assessment of phylogenetic trees. Bootstrapping is a method for deriving confidence values for the groupings in a tree (i.e. the subtree defined subsets of the total sequence set). It involves making N random samples of sites from the alignment, drawing N trees (1 from each sample) and counting how many times each grouping from the original tree occurs in the sample trees. Some of the trees generated in this thesis were bootstrapped to provide a measure of the quality of the molecular phylogenies generated. 5. Phylogenomics All phylogenetic analyses use observations of extant species, genes and sequences, not the true ancestors of these entities. Nonetheless, a possible gene sequence evolution may be approximately inferred from molecular phylogenetic trees derived from MS As. The utility of such evolutionary trees is that one can potentially predict the function of genes by overlaying known functions onto the tree, then assessing the functions of unknown genes based upon the proximity of such gene sequences to other genes of known function in the tree. This type of approach has been coined "phylogenomics" (Eisen, 1998). The construction of gene families using any tree building technique automatically lends itself to such analysis. In the current thesis project, a modest attempt was made to interpret subfamily groupings of helicases in this manner, from ClustalW and Bete trees. A related question in comparative analysis is to construct a one-to-one ortholog mapping between paralogous genes in model organisms such as Caenorhabditis elegans and the 165 paralogous genes in, say, Homo sapiens. Accurately making such an ortholog mapping might be expected to facilitate the identification of critical residues conserved for functional reasons, only within a single particular subfamily. Also, with an accurate ortholog mapping, one can study a given homolog of a gene in the model organism, more confident that functional insights obtained from observations in one model organism might be accurately extrapolated to the defined orthologs in other organisms. With the complete sequencing of the smaller genomes of prokaryotes and yeast, this problem occasionally has a trivial solution when there is but one highly significant database match for the gene (sub)family in question. This situation may be true of the RecQ DEAH helicases in some species. For example, in Saccharomyces cerevisiae, only one significant RecQ homolog, Sgsl, is known (Gangloff etal. 1994; Watt etal, 1995). For higher organisms, gene duplication and evolution have expanded gene families and permit sub-family specialization, obscuring unique structure-function relationships within larger families ' O f paralogous genes. Accurate ortholog mapping between such gene families across species becomes more problematic yet more critical for correct genetic functional analysis. 6. Visualization of Molecular Phylogenies A novel tool for visualizing phylogenetic trees within the context of a genomic database (ACEDB), specifically, the new "dendrogram" display for ACEDB, was developed during this thesis project. This new data modeling and graphics function (described further below) permits the representation of ClustalW and Bete protein trees, and, in the case of ClustalW generated trees, bootstrap values. 166 7. Sequence Analyses of the WRN/DExH Gene Family a) WRN is a DEAH Helicase As mentioned previously, WRN is a putative member of the large gene family of nucleic acid dependent, NTP hydrolyzing enzymes called helicases, which are involved in the replication, recombination, transcription and translation of nucleic acids. On the basis of sequence similarity, helicases have been classified into five superfamilies (Gorbalenya and Koonin, 1993); however, sequence similarity across these superfamilies is low and largely restricted to two signature domains (helicase motifs I and II) required for NTP binding. Based upon sequence similarity, WRN is postulated to be a member of the RecQ subfamily of DEAH helicases belonging to the SF2 helicase superfamily (Yu et al. 1996). The SF2 superfamily is characterized by seven conserved helicase signature sequence domains. The DEAH helicases subfamily are so named due to the strong conservation of a domain II sequence of aspartate-glutamate-alanine-histidine (DEAH). b) Identification of WRN Orthologous Loci in Model Organisms Immediately subsequent to the positional cloning of WRN (Yu et al. 1996), bioinformatics analysis of WRN/DExH related DNA helicase homologs was initiated. In particular, BLAST searches of the Caenorhabditis elegans genome (finished and unfinished) sequences revealed the existence of several possible homologous sequences which might represent the candidate nematode locus orthologous to WRN. The starting point for the final bioinformatics phase of this thesis was a TMBLAST screen of the entire non-redundant (nr) Genbank protein database (as of summer/fall 1998), using WRN as the seed query. It was observed that the database is not as "non-redundant" as the name implies. Several sequences pulled out were duplicate or near duplicate sequences. There are likely many polymorphisms, sequencing errors and splice variants populating the so-called non-167 redundant database. Could functional information be inferred from, say, splice variants, if they are preserved across taxa, hence across orthologous loci? In general, ClustalW did a good job of clustering protein sequences into reasonable subfamilies. For example, most of the eukaryote initiation factor 4 (eIF4) proteins are found within the same subtree. One interesting exception is the eIF4A designation for a Methanobacterium thermoautotrophicum helicase found in the RHLP helicase subfamily tree (Figure 30(b)). The simplest explanation for this observation is that this protein was misannotated. This example suggests that molecular phylogenetic analysis can serve to properly annotate genes (and fix errors in annotation). The detailed comparative analysis of helicase gene protein sequences undertaken in this thesis constructed a clearer picture of the possible ortholog mapping between paralogous families of RecQ helicases across largely diverged taxa, in particular, between model organisms such as Caenorhabditis elegans and humans (Homo sapiens). Specifically, the results of this analysis clearly designates the worm loci T04A11.6 and K02F3.1, respectively, as the true orthologs of BLM and RECQL. The F18C5.2 locus was mapped onto WRN with some degree of confidence, although an alternate data analysis (Bete) suggested a moderately closer similarity between E03A3.2 and WRN, in conflict with ClustalW results. Close examination of domain VI and other subsequence pair wise alignments suggest that the ClustalW orthology assignment is correct. It is interesting to contrast the bioinformatics results obtained in this thesis project, against public database source of ortholog mappings. Clusters of Orthologous Groups ("COGS"; Tatusov, Koonin and Lipman, 1997) does not construct as detailed an ortholog mapping (for helicases) as the one in thesis. As such, COGS does not appear to represent an improvement over simple BLAST searches (i.e. COGS simply states that WRN is a RECQ helicase). The public database, XrefDB (Bassett et al, 1996), which attempts to assign loci 168 from model organisms to specific human disease genes, specifies an ortholog mapping for WRN (E03A3.2 rather than F18C5.2) and BLM (K02F3.1 rather than T02A11.6) different from those ascertained in this thesis (above). It is apparent from a detailed inspection of key "signature" subsequences in the MSA's of the nematode and human sequences concerned, that the ortholog mappings determined by this thesis project are likely the correct ones. This observation underscores the necessity of constructing and verifying sufficiently rigorous MSA's (e.g. using ClustalW) and constructing molecular phylogenetic trees, prior to specifying ortholog mappings between gene loci in model organisms and human beings. c) Identification of Critical Residues Since the identification of WRN, the search has been on for characteristic and functionally significant sequence motifs outside the canonical helicase domain. Immediately prior to the final bioinformatics phase of this thesis, a comparative sequence analysis of ribonucleases, using a Hidden Markov Model generalized with Dirichlet Mixture Priors, uncovered a domain shared between the RNase D family and WRN (Mian, 1997). A subsequent paper identified an additional putative nucleic acid-binding domain, HRDC, in WRN, BLM, RecQ and Sgsl (Morozov et al, 1997). The bioinformatics analysis in this thesis project attempted to characterize additional sequence motifs. The strategy employed was one of phylogeny construction to characterize orthologous loci, which were then be used to assess subfamily-specific conservation of residues across moderate to large evolutionary distances. Specifically, four proposed WRN related subfamily C. elegans/H. sapien ortholog pairs were aligned using BLAST2 and ClustalW, to identify highly conserved ortholog-specific subsequences lying outside the canonical helicase motifs. C. elegans and H. sapiens are believed to have diverged in evolution about 450-550 million years ago (Ann Rose, personal communication); therefore, highly conserved 169 subsequences in orthologous loci may indicate structurally or functionally significant regions of the protein. The analysis of critical residues in orthologous genes was not pursued beyond MSA identification of highly conserved residues. Clearly, additional detailed analyses could be undertaken. With time, the overall strategy of ortholog mapping could be refined and automated for application to complete genomes (e.g. C. elegans versus human), to generate the next generation in "BLOCKS" databases and "PFAM" models of conserved protein motifs, at the level of detail of ortholog map defined gene sub-families. d) Assessment of WRN/F18C5.2 Ortholog Pair MSA analysis appears to moderately favor an ortholog mapping of the C. elegans locus F18C5.2, rather than E03A3.2, onto WRN. It is interesting to note, however, that detailed analysis of both nematode loci indicates that neither locus contains the RNase D homologous N-terminal domain observed in WRN (Mian, 1997). This observation likely represents a limitation in the use of the worm "ortholog" as an experimental system to study WRN, since the possible role of the WRN RNase domain cannot be tested in the worm! One possibility is that F18C5.2 works in conjunction with another protein with RNase D activity, to complete its functionality. Another possibility is that an RNase D domain "proofreading" activity in WRN represents a serendipitous but advantageous gain of function in evolution, increasing the efficacy of helicase function. A third possibility is that there remains yet one more helicase to be sequenced in the worm, highly similar to F18C5.2, but which includes the RNase domain at the N-terminal end, hence is the "true" ortholog of WRN. An updated BLAST search of the nematode genome project sequence databases (October 1998) does not reveal such a locus. Given the effective completion of the nematode genome sequencing project at the time of this writing, this last possibility is effectively ruled out. 170 e) Assessment of WRN Vertebrate Orthologs Mus musculus and Homo sapiens are both mammals, hence somewhat related in evolutionary time. Xenopus laevis is a vertebrate, albeit of somewhat greater evolutionary divergence from mammals. This is reflected in the high conservation of amino acid residues in the candidate frog, mouse and human WRN orthologs, including the presence of the N-terminal RNase D domain in each species (a domain which is absent in the nematode ortholog, F18C5.2). The Werner syndrome phenotype is characterized both by apparent premature senescence and increased cancer incidence. In this context, it is thus interesting to consider the following two related research questions posed about mammalian senescence (Miller, 1995): • Why are the lifespan and cancer susceptibility of mice (or other mammals) so dramatically different from humans? • Why does cancer incidence correlate so strongly with the rate of aging in mammals, that is, that short lived mammals (e.g. mice) show rapid progression of neoplasia, whereas long lived mammals (e.g. humans) generally exhibit slower oncogenesis? Miller (1995) proposes that a strong coupling exists between cancer susceptibility and lifespan in mammals. This phenomenon may even be extendable to other vertebrates. Such coupling is not adequately explained by the cancer research paradigm that asserts that the high incidence of neoplasia in the elderly simply reflects the passage of time needed for a multi-hit cell transformation cascade. A corollary to this statement is that whatever retards aging to varying degrees in different animals, must also slow the rate of oncogenesis in each animal to the same corresponding extent. Moreover, the size of the subset of genes with such global effects of retardation of aging and oncogenesis is likely to be small. In the context of WRN, there is an obvious question to pose. Are there detectable differences in the mouse (or other vertebrate) versus human orthologs for WRN that could place the gene into this hypothesized subset of genes affecting both lifespan and cancer incidence? 171 This question can be seen as an echo of the Werner syndrome phenotype question. That is, what selective advantage does WRN avail to organisms in terms of increased lifespan and suppression of neoplasia? Does the human WRN gene perform this functional role with greater efficacy than its mouse ortholog? If this is indeed the case, then what element of the WRN gene in humans, versus mice, reflects this efficacy? This thesis project can only provide an intriguing hint of a possible answer to these questions, in the context of a MSA of the protein sequences of WRN candidate orthologs in Xenopus laevis, Mus musculus and Homo sapiens. Such analysis cannot address, for example, variations in gene expression caused by differences in the WRN promoter or similar regulatory mechanisms in humans versus other vertebrates. Are there sequence differences between these WRN orthologous loci worthy of attention? Using ACEDB "dotter" functionality, it was noted (Figure 28) that human WRN contains a direct repeat of a highly conserved subsequence in the N-terminal end of the protein. Examination of the pair wise comparison between the candidate WRN orthologous sequences in Xenopus, Mus and Homo (Figure 40) suggest that this sequence lies within a highly conserved region of high (>50%) residue identity in the protein, indicative of a possible structural or functional constraint. A "NDNEND" fragment of this subsequence ("TNEEKD" in Xenopus) appears to be somewhat hydrophilic (Figure 29), suggestive of the likelihood that this subsequence lies on the surface of the WRN protein, possibly mediating some interaction of WRN with either the substrate or other ancillary proteins. Assuming that this repeat in human WRN is real (i.e. not a sequence data entry error), it would be interesting to determine what functional role this subsequence plays in moderating the interactions between WRN substrate and the gene product, or the kinetics of protein-protein interactions with ancillary proteins. One would also wish to assess the quantitative effect that 172 repetition of this sequence has upon this function. Determination of the extent of conservation of this subsequence in related mammalian species exhibiting lower a lifespan than Homo sapiens should also be ascertained. The alternative is that if WRN does participate in the extension of lifespan, and the suppression of cancer, it may be gene expression rather than protein architecture which may make the difference. After all, one needs to account for very diverse lifespan/cancer susceptibility differences in a very wide range of organisms, a task easier to rationalize as changes in gene regulation. However, some gross effect, say in primates or human beings, due to advantageous changes in the structure of the WRN gene, cannot be ruled out. E. HelicaseWeb The concept of a gene-family specific WWW site is not new (Henikoff, Endow and Greene, 1996;; however, no such site has been compiled for the DExH helicases (of which WRN is a putative member by sequence similarity), nor even for helicases in general. In light of this fact, the construction of such a WWW site was included in the final bioinformatics phase of this thesis. The overall content of the website is largely based upon material gleaned from other information resources on the internet, with the addition of a local ACEDB database (see next section) containing thesis-compiled WRN/DExH data and bioinformatic analyses. F. HelicAceDb In addition to static general gene family data and links to other web pages, it is possible to build a dynamic representation of gene related data, with good mapping and sequencing details. At the time of this writing, an ACEDB for Windows standalone implementation of such a research database for the WRN/DExH helicases is partially implemented. 173 To meet the challenge of providing good representation of gene-family sequence and mapping resources within HelicaseWeb, the thesis ACEDB software development effort was partially directed towards the incorporation of a graphical Windows-based WWW client interface, communicating with a Windows NT ACEDB "back end" server. Although this work was not completed prior to the writeup of this thesis project, significant progress was made in defining the requirements of such a WWW interface, and in becoming familiar with its technical requirements. G. ACEDB for Windows '95/NT 4.0 Although initially ancillary to the primary thesis project, continued thesis-related work pertaining to the Microsoft Windows version of ACEDB inspired an integration of this work into the body of this thesis. It is therefore appropriate here to describe the scope of this software development work. ACEDB is an object oriented, non-commercial (freeware) database system originally designed to manipulate data from the C. elegans nematode genome mapping and sequencing project (Thierry-Mieg and Durbin, 1996). Due to the inherently generic and flexible design of the ACEDB database kernel, schema specification, data input/output facilities and user interfaces, it has been possible to adapt ACEDB for use in the research of other biological systems. In addition, since the ACEDB software package was designed with computing platform independence in mind, it is therefore previously ported onto a number of distinct UNIX platforms (SunOS, DEC/OSF1, Solaris, SGI and Linux), as well as the Apple Macintosh (read-only version). The result of an intellectual collaboration of many biologists and computer scientists over five years, the ACEDB database system is an innovative software application powerful enough to effectively handle voluminous and complex scientific data. The current user community spans 174 several continents and includes major species-specific genome sequencing projects (including several within the Human Genome Project, e.g. Sanger Centre), major publicly-funded biotechnology (especially, plant genome) database services (e.g. see and and a number of biotechnology groups in private industry (e.g. see Miller, Fuchs and Lai, 1997). /. Acedb to Windows Port Benefits Although UNIX workstations and Apple Macintosh/PowerPCs form a large market share within the scientific computing community, Intel x86 based PCs continue to dominate personal computing due to favorable pricing and marketing considerations. Although a version of ACEDB was available in 1995, which runs in Linux (a UNIX variant currently available upon Intel x86 PCs), Linux was not yet in as widespread use as a PC operating system as Microsoft Windows. An early attempt to port ACEDB to MSDOS was unsuccessful, due to the limitations of that 16 bit operating system; however, in May 1995, at ACEDB '95 (the annual ACEDB developers' and users' conference/workshop) the feasibility of an Intel x86 PC port was re-evaluated in light of the following observations: • The processing performance and storage capabilities of Intel x86 class PCs had increased significantly, with dramatically improved performance/cost ratios. • Many Intel x86 PCs were now running Microsoft Windows, a relatively powerful graphical user interface (GUI) well supported by good software development tools and libraries. • In 1995, the release of Windows '95 for widespread distribution within the general PC/Windows user community would transform the popular Windows PC platform 175 into a full 32 bit, pre-emptive multitasking operating system environment suitable for the execution of more demanding software applications such as ACEDB. It was thus concluded by several members of the ACEDB community at that time, that a port of ACEDB onto the Windows (Win32) Intel PC platform would be desirable and feasible. 2. Implementation As a consequence of his presence and initiatives taken during the ACEDB '95 meeting, I was invited by Dr. Richard Durbin (Sanger Centre, UK; a founding co-author of ACEDB) to initiate the Windows ACEDB software conversion project. I subsequently remained the project's principal development programmer of the Windows-specific ACEDB code. Immediately following ACEDB'95, a text-only version of ACEDB was made functional within two weeks, proving the feasibility of the concept. Over the course of several additional months of part-time development work, the first graphical Windows interface to ACEDB was implemented then released for "alpha" testing in January 1996. In June 1996, I spent two weeks at the Sanger Centre, Hinxton, UK, transferring the source code into the master ACEDB program source repository there. Additional "beta" releases of the software were made over the following year. In the spring of 1997, a rudimentary Windows NT client/server prototype was also initiated. A full official "production" release of the standalone, graphical version of ACEDB for Windows ("ACEDB for Windows") was made in August, 1997, subsequent to ACEDB ' Cornell University. Additional releases of the software were periodically made throughout the subsequent thesis project period, with significant improvements in reliability and performance. Development of client/server versions and implementation of feature enhancements to the software continued during the past year. During the final phase of this thesis project, an additional software development effort was made to conceive, design, implement and refine novel functionality within ACEDB in 176 general, to support the task of gene family characterization. The primary outcome of this effort was the creation of a novel, non-platform specific ACEDB display, the "Dendrogram" tree display, which permits the visualization of linked object oriented trees, in the computing theoretic graph sense of the word. The primary use of the display in this thesis project was to represent both taxonomic trees of organisms and protein sequence molecular phylogeny trees, including branch distances and bootstrap values, generated by ClustalW and Bete analyses. The success of the design effort for the dendrogram display is reflected in the number of uses in which the new display has already found expression in ongoing research at the Sanger Centre and elsewhere. The display has now being adapted by Sanger Centre personnel to the representation of nematode cell lineages, other protein phylogenies and organization of expression data images (Sanger Centre bioinformatics group, personal communications). This doctoral candidate is solely responsible for all the approximately 25,000 lines of platform-specific programming language C and C++ program code required to make ACEDB to run under the Microsoft Windows "WIN32" operating system. Over a thousand hours during the past three and a half years was devoted to developing ACEDB for Windows into a robust (relatively bug free software) and a fully functioning version of ACEDB on the Microsoft Windows '95 ('98) and Windows NT 4.0 operating system platforms. The software is now in general production use in numerous scientific laboratories and private offices throughout the world. Personal communications with many scientific colleagues indicate that this version of the software has gained wide acceptance and popularity by ACEDB users everywhere in the world. ACEDB for Windows does not currently embody all the functionality of its fully featured UNIX cousin; however, it is thought that about 90% of the core functionality of ACEDB is now fully and robustly represented in ACEDB for Windows. The biggest deficiencies in ACEDB for Windows are in two areas: 177 • External Program Interfaces: non-ACEDB programs not ported from UNIX • Client/Server Versions: are in the development stages; lack good support for Windows/UNIX cross-communication and for WWW functionality. Again, it is anticipated that a postdoctoral research project will permit additional ACEDB for Windows development to overcome these deficiencies in the software. H. The Current Status of Werner syndrome Research Since the positional cloning of WRN, sequence analysis and biological assays have begun elucidating the various functional components of the gene product. First of all, WRN contains the core motifs of RecQ family helicases (Yu et al, 1996a). In fact, WRN is now known to exhibit ATP-dependent unwinding of DNA in the 3' to 5' direction (Suzuki et a.l, 1997; Gray et al, 1997), in a similar manner to RecQ (Umezu et al, 1990) and Sgsl (Lu et al, 1996). Surprisingly, this activity, although diminished, is not abolished in a mutant protein carrying a truncation of the C-terminus (nt 3370 to 3464) due to a frameshift mutation downstream of the helicase motifs (Gray et al, 1997). This frame shift mutation represents a common WRN disease allele observed in Japanese Werner syndrome patients (Yu et al, 1996a). In general, the presence or the absence of the helicase domain does not influence the Werner syndrome phenotype, suggesting that a complete loss of function of the gene is generally responsible for the disease (Yu et al, 1997). Recent work indicates that the strand displacement activity of WRN is stimulated by single-stranded DNA-binding proteins (SSBs), in a manner similar to RecQ (Harmon and Kowalczykowski, 1998), that is suggestive of direct protein-protein interactions between WRN and SSBs (Shen et al, 1998). The second potential functional modality inferred by sequence analysis is ribonuclease activity, since the N-terminus of WRN shares sequence similarity with domains characteristic of the RNase D gene family (Mian, 1997). This activity could be a "proof reading" 3' to 5' exonuclease activity. Interestingly, this N-terminus RNase D homologous sequence is 178 completely absent in F18C5.2, the inferred WRN ortholog in C. elegans (this thesis). Thus, although F18C5.2 may be the WRN ortholog in nematodes, WRN homologs likely have an additional functional role in higher animals. The third potential functional modality, again largely inferred by sequence analysis, is DNA binding. In this regard, a distal C-terminal domain, designated HRDC (Helicase and RNase D C-terminal) domain was characterized in WRN, BLM, SGSI and RNase D (Morozov et al, 1997). This domain appears to be non-enzymatic in nature but highly conserved with predicted secondary structure, dominated by hydrophobic helix-loop-helix domains. The HRDC domain appears to be a region of mutational susceptibility in Werner syndrome (Oshima et al., 1996) and abolishes BLM activity assayed in a yeast background (Lu et al.,\991). Localization of a gene product is often a good clue as to its functional role in the cell. In the case of WRN, a putative DNA helicase, it is not surprising to find the gene product localized to the nucleoplasm of cells (Suzuki et al, 1997). Recent results provide a refinement on this observation, by placing the subcellular localization of human WRN in the nucleolus and in nucleolar biochemical fractions (Marciniak et al, 1998; Gray et al, 1998). This is consistent with a hypothesis that WRN may play a role in rDNA maintenance and transcription, hence ageing related effects upon the nucleolus. Such a role is invoked for Sgsl, the yeast homolog to WRN (Sinclair, Mills and Guarente, 1997). Interestingly, though, the mouse WRN homologue exhibits more diffuse localization in the nucleus (Marciniak et al, 1998). Some progress is now being made in characterizing WRN gene regulation. Nucleolar activity of WRN appears to be modulated by tyrosine phosphorylation (Gray et al, 1998). The structure and function of the promoter is being characterized. Two transcription initiation sites appear to be present and the promoter exhibits features of being "constitutive". The activity in 179 WRN patient cells also appears to be dramatically reduced, suggesting that the promoter is regulated positively by the wildtype WRN gene itself (Wang et al, 1998). RecQ appears to act as both an initiator of homologous recombination and a disrupter of aberrant recombination processes (Harmon and Kowalczykowski, 1998). Sgsl, BLM and WRN appear to share the feature of suppressing illegitimate recombination (Yamagata et al, 1998). In fission yeast, the candidate BLM ortholog, radl2+ (rqhl+) appears to be involved in cell cycle checkpoint control and S phase arrest (Davey et al, 1998; Stewart et al, 1997). Finally, most recently, experimental analysis of of DNA replication initiation foci in Xenopus laevis revealed that the WRN ortholog in this amphibian is the "replication focus forming activity 1" (FFA-1), exhibiting helicase activity essential to the proper aggregation of "replication protein A" (RPA) in, and function of, DNA replication foci (Yan et al, 1998). Interestingly, an MSA of this Xenopus laevis candidate ortholog against WRN human and mouse orthologs (Figure 40) also shows high conservation of a portion of the subsequence observed to be directly repeated in human WRN (Figure 28) but not in Xenopus or Mus. I. Future Directions /. Thesis Work The Human Genome Project (HGP) is proceeding beyond genome mapping and into large scale sequencing. Werner syndrome research is proceeding into a detailed experimental analysis of gene function in many labs, using a range of molecular techniques beyond the scope of this thesis project. In addition, much is currently known about DNA helicases. The bioinformatics work provided some limited information about WRN gene function by phylogenetic inferences. That bioinformatics remains merely a starting point for detailed laboratory "wet bench" experimentation is being proven true in the study of WRN. The primary 180 legacy of the bioinformatics analysis in this thesis is the elucidation of an ortholog mapping between genes in other "model" organism and the series of known human paralogs of WRN. This should assist the correct assignment of model organism experimental data to the appropriate human gene and disorder. 2. Characterization of Gene Families The bioinformatics approach taken in this thesis towards the characterization of gene families was the simple ascertainment of molecular phylogeny based upon the analysis of multiple sequence alignments. Candidate orthologous loci from evolutionary distant species (in this case, nematodes and humans) defined by phylogenetic proximity were then examined by fine scale pair wise sequence comparisons to identify highly conserved subsequences outside primary canonical motifs defining the encompassing gene family. One might ask what additional analysis can one do upon these alignments of candidate orthologous sequences? Orthologous genes of moderate evolutionary distance from each other would be expected to retain some identities in amino residues solely due to common descent and lack of mutation, rather than any functional selection of the residues. How can one distinguish between such residues and those with significant structural or functional roles? Many approaches might be considered: • Subsequences can be selected based upon statistically high local concentrations of identities or conserved residue character (hydrophobicity, charge, etc) • Bete analysis generally requires a lot of sequences to produce a useful analysis. A more limited form of Bayesian analysis using Dirichlet mixture priors on a limited number of candidate orthologous sequences (as few as two) might focus attention upon subsequence spans likely to be under biochemical constraints. 181 • The DNA coding of the residues can be compared: if codon mutational drift appears statistically constrained in a subsequence also exhibiting general amino acid residue conservation, then the subsequence may be under structure/function constraints. • Subsequences could be tested against sequence or motif databases to detect their conservation in other proteins as motifs. Then, once conserved subsequences are solidly ascertained in orthologous proteins, would it not then be t. possible to directly computationally infer structural and functional characteristics of the protein in a given gene family based upon such residue conservation? That is, conserved subsequences in true orthologs are likely to be located in identical locations in the protein, and performing the same structure/function role in each member of the family. If a given subsequence is computationally predicted to be an alpha helix, beta strand, variable loop or a turn in the secondary structure, is it not likely that that given subsequence will have the same character in every orthologous protein? At a very minimum, such information could prove extremely valuable in deciphering structural data (X-ray or NMR) when the precise structure is not yet known, or in guiding the threading of the unknown protein sequence into known homologous protein structures? 2. ACEDB Despite reservations by some genomic scientists that ACEDB is "yesterday's" genomic database, ACEDB's successful paradigm of community development and curation, plus general support for the database in several significant large scale genome sequencing centres, has ensured its survival and a steady expansion in its functionality. There is a modest movement at the Sanger Centre (the main activity centre for ACEDB) to formalize the software maintenance of ACEDB. Increasing software support for WWW interfaces is the focus of considerable effort 182 by ACEDB developers. A formal reference manual for ACEDB may soon be written (Jean Thierry-Mieg, personal communication). It may be thus that ACEDB finds a new equilibrium of sustainability during the coming years of the steadily expanding Human Genome Project. Throughout this thesis project, ACEDB development was pursued in three ways: the development of the Microsoft Windows port of ACEDB; the troubleshooting of primary ACEDB code; and the creation of the new "dendrogram" tree display functionality. This latter software innovation for ACEDB already promises to find diverse use in various comers of the ACEDB community (Sanger Centre, personal communications). 3. ACEDB for Windows I did not suspect, in 1995, what kind of adventure would await me with the initiation of the Microsoft Windows ACEDB port project. The creation of this first functional Windows version of ACEDB has been repaid many times, in the popularity of the port worldwide. This fact was made especially clear to this candidate upon a family holiday in 1997 to Asia, where he was invited to give scientific presentations about ACEDB for Windows in the Philippines, Japan and Korea. This trip also coincided with the release by the Korean Rice Genome Program of a CDROM with ACEDB for Windows providing an interface to "RiceGenes" and "GrainGenes" - two plant genetic databases implemented in ACEDB. This CDROM was in broad circulation worldwide in the plant genomics community before this candidate was aware of it and before his Asian trip. For this reason, I was warmly received by rice genome scientists in the Philippines and Korea during my trip. ACEDB for Windows, standalone version, is fairly robust at the present time; however, there remain several features incompletely implemented. Future work on the port (ancillary to my postdoctoral research work) will hopefully address these deficiencies. In particular, this 183 candidate would like to do a better job of extending ACEDB for Windows into the external program functionality already found with the UNIX version of ACEDB. In addition, the job of implementing a WWW interface to ACEDB for Windows is a priority: it was an objective in this doctoral project which remains incomplete at this time. This writer would also like to exploit Microsoft Windows specific capabilities for internet software. The back-end component of the ACEDB for Windows WWW clients, ACEDB client/server functionality, currently in "alpha" testing, also remains to be fully troubleshooted and integrated with existing UNIX server systems within "real" production network environments. 4. The Future of Bioinformatics A revolution in increased computing performance for decreased costs was observed throughout the duration of this thesis project. Concurrently, network distributed computing took hold in the form of the WWW. In the realm of biology, DNA sequencing took on a whole new meaning, as genome projects throughout the world continue to tackle increasing fragments of diverse genomes, in particular, the human genome. Bioinformatics is at the forefront, seeking to exploit the fruits of the former revolution to manage the chaos of the latter revolution. The future challenge of bioinformatics is to meet the following criteria in biological data management and analysis: • Useability: are the algorithms, programs and user interfaces accessible to the average biological scientist? How steep is the learning curve? What are the resource demands? How accessible are the tools? • Accuracy: are the answers given by the tools correct? What are the limitations in quality? How does the output of one program compare with another? Can a given software tool "justify" its answer? 184 Integration: do the tools lend themselves to easy (semi-)automation of the analytical process? Does one given tool interface easily with another? Can biologists who are not bioinformatics specialists find their way, easily and systematically, among the tools to answer their research questions? 185 ABBREVIATIONS ACEDB "A Caenorhabditis elegans DataBase" bp basepair(s) of sequence length; "kb" is kilobase pairs CEPH Centre d'Etudes de Polymorphismes Humaines (Paris, France) ddNTP Dideoxy nucleotide (in Sanger sequencing) dH20 Distilled, deionized (generally autoclaved) water DEAE Diethyl aminoethyl cellulose paper DMSO Dimethyl Sulphoxide DTT Dithiothreitol EDTA Ethylenediaminetetraacetic acid EMS Ethylmethanesulphonate EST Expressed sequence tag GNRJT Gonadotropin-releasing hormone (synonyms: GRH; LHRH, Luteinizing Hormone Releasing Hormone) GSR Glutathione reductase gene locus GTF2E2 Locus of the p34 (beta) subunit of the generalized transcription II factor E. HMM Hidden Markov Model LOD Log likelihood of odds (linkage map probability scoring function) MIM Mendelian Inheritance of Man MSA Multiple sequence alignment NCBI National Center for Biotechnology Information PAGE Polyacrylamide gel electrophoresis PIC Polymorphic Information Content PCR Polymerase Chain Reaction PFGE Pulsed Field Gel Electrophoresis PMSF Phenylmethylsulfonyl fluoride PPP2CB Locus of the beta isoform of the catalytic subunit of protein phosphatase 2A. SDS Sodium Dodecyl Sulfate STR(P) Simple Tandem Repeat (Polymorphism) STS Sequence Tagged Site URL Universal (or Uniform) Resource Locator (for internet resources) WWW World Wide Web YAC Yeast Artificial Chromosome 186 BIBLIOGRAPHY Adelman JP, Mason AJ, Hayflick JS and Seeburg PH. (1986) Isolation of the gene and hypothalamic cDNA for the common precursor of gonadotropin-releasing hormone and prolactin releasing-inhibiting factor in human and rat. Proc.Natl. Acad. Sci. USA 83:179-183 Altschul S, Gish W, Miller W, Myers E and Lipman D. (1990) Basic local alignment search tool. J. Mol. Biol. 215:403-10. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389-3402 Altschul SF, Madden TL, Schaffer AA, Zhang J, Miller W and Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. AcidssRes. 25:3389-3402 Anand R, Villasante A and Tyler-Smith, C. (1989) Construction of yeast artificial chromosome libraries with large inserts using fractionation by pulsed-field gel electrophoresis. Nucleic Acids Res. 17:3425-3433 Attwood, T.K., Beck, M.E., Flower, D.R., Scordis, P. and Selley, J. (1998) The PRINTS protein fingerprint database in its fifth year. Nucl. Acids Res. 26(l):304-308. Bairoch A., Bucher P. and Hofmann K. (1997) The PROSITE database, its status in 1997 Nudeic Acids Res. 25:217-221 Bassett Jr DE, Boguski MS, Spencer F, Reeves R, Goebl M, and Hieter P. (1997) Comparative genomics, genome cross-referencing and XREFdb Trends in Genetics 11:372-373 Bellanne-Chantelot C, Lacroix B, Ougen P, Billault A, Beaufils S, Bertrand S, Georges I, Glibert F, Gros I, Lucotte G, Susini L, Codani JJ, Gesnouin P, Pook S, Vaysseix G, Lu-Kuo J, Ried T, Ward D, Chumakov I, Le Paslier D, Barillot E and Cohen D. (1992) Mapping the whole human genome by fingerprinting yeast artificial chromosomes. Cell 70:1059-68 Birnboim HC and Doly J. (1979) A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res 7(6): 1513-23 Botstein D, White RL, Skolnick M and Davis RW (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980 May; 32(3): 314-331. Brass N, Heckel D, Sahin U, Pfreundschuh M, Sybrecht GW and Meese E. (1997) Translation initiation factor eIF-4gamma is encoded by an amplified gene and induces an immune response in squamous cell lung carcinoma. Hum. Molec. Genet. 6:33-39 Brenner S (1974). The genetics of Caenorhabditis elegans. Genetics 11:11-94. Brown MP, Hughey R, Krogh A, Mian IS, Sjolander K and Haussler D. (1997) Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunder L, Seals D and Shavlik J. Eds. ISMB-93, Menlo Park CA AAAI/MIT Press, pp. 47-55 187 Brownstein BH, Silverman GA, Little RD, Burke DT, Korsmeyer SJ, Schlessinger D and Olson MV. (1989) Isolation of single-copy human genes from a library of yeast artificial chromosome clones. Science 244(4910): 1348-1351 Bruskiewich R, Everson T, Ma L, Chan L, Schertzer M, Giacobino JP, Muzzin P and Wood S. (1996) Analysis of CA repeat polymorphisms places three gene loci on the 8p linkage map. Cytogenet. Cell Genet. 73:331-333 Bruskiewich R, Schertzer M and Wood S. (1997) A Long Range Physical Map Spanning the Werner syndrome Region. Genome 40:77-83 Bruskiewich R, Rose A and Wood S. (1997) Functional Analysis Of A Werner syndrome Gene Homolog In C.Elegans. Poster presented at 11th Int'l C. elegans Conference, Madison, WI (June-July, 1997) Buckler A, Chang D, Graw S, Brook J, Haber D, Sharp P and Housman D. (1991) Exon amplification: a strategy to isolate mammalian genes based on RNA splicing. Proc. Natl. Acad. Sci. USA 88:4005-9 Burke D, Carle G and Olson M. (1987) Cloning of large segments of exogenous DNA into yeast by means of yeast artificial chromosome vectors. Science 236:809-12 Cattanach BM, Iddon CA, Charlton HM, Chiappa SA and Fink G. (1977) Gonadotropin-releasing hormone deficiency in a mutant mouse with hypogonadism. Nature 269:338-340. Chaffanet M, Imbert A, Adelaide J, Lepaslier D, Wagner MJ, Wells DE, Birnbaum D and Pebusque MJ (1996) A 3.1 MB YAC contig within the Werner syndrome region on the short arm of human chromosome 8. Cytogenet. Cell Genet. 721:63-68 Chaganti RSK, Schonberg S, German J. (1974) A manyfold increase in sister chromatid exchanges in Bloom's syndrome lymphocytes. Proc. Natl. Acad. Sci. USA 71:4508-4512. Chumakov I, Rigault P, Le Gall I, Bellanne-Chantelot C, Billault A, Guillou S, Soularue P, Guasconi G, Poullier E, Gros I, Belova M, Sambucy JL, Susini L, Gervy P, Gilbert F, Beaufils S, Bui H, Perrrot V, Saumier M, Soravito C, Bahouayila R, Chohen-Akenine A, Barillot E, Bertrand S, Codani JJ, Caterina D, Georges I, Lacroix B, Lucotte G, Sahbatou M, Schmit C, Sangouard M, Tubacher E, Dib C, Faure S, Fizames C, Gyapay G, Millasseau P, NGuyen S, Muselet D, Vignal A, Morisset J, Menninger J, Lieman J, Desai T, Banks A, Bray-Ward P, Ward D, Hudson T, Gerety S, Foote S, Stein L, Page D, Lander E, Weissenbach J, Le Paslier D and Cohen D. (1995) A YAC contig map of the human genome. Nature 377 supp.:175-183 Cohen D, Chumakov I and Weissenbach J A. (1993) First-generation physical map of the human genome. Nature 366(6456):698-701 Cole C, Goodfellow P, Borrow M and Bentley D. (1991) Generation of novel sequence tagged sites (STSs) from discrete chromosomal regions using alu-PCR. Genomics 10:816-826 Collins J and Hohn B. (1978) Cosmids: a type of plasmid gene-cloning vector that is packageable in vitro in bacteriophage heads. Proc. Natl. Acad. Sci. USA 75:4242-6 188 Cox D, Burmeister M, Price E, Kim S and Myers R. (1990) Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes. Science 250:245-50 Cristofalo VJ and Pignolo RJ (1993) Replicative senescence of human fibroblast-like cells in culture. Physiol Rev 73(3):617-638. Dausset J, Cann H, Cohen D, Lathrop M, Lalouel J and White R. (1990) Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics 6:575-77 Davey S, Han CS, Ramer SA, Klassen JC, Jacobson A, Eisenberger A, Hopkins KM, Lieberman HB and Freyer GA. (1998) Fission yeast radl2+ regulates cell cycle checkpoint control and is homologous to the Bloom's syndrome disease gene. Mole. Cell. Biol. 18(5):2721-2728 Deaven LL, Van Dilla MA, Bartholdi MF, Carrano AV, Cram LS, Fuscoe JC, Gray JW, Hildebrand CE, Moyzis RK, and Perlman J. (1986) Construction of human chromosome-specific DNA libraries from flow-sorted chromosomes. Cold Spring Harbor Sympos. Quant. Biol, VolumLI, P. 159-167 Drapkin R, Reardon JT, Ansari A, Huang JC, Zawel L, Ahn K, Sancar A and Reinberg D. (1994) Dual role of TFIIH in DNA excision repair and in transcription by RNA polymerase II. Nature 368:769-772. Dretzen G, Bellard M, Sassone-Corsi P and Chambon P. (1981) A reliable method for the recovery of DNA fragments from agarose and acrylamide gels. Anal.Biochem. 112:295 Duyk G, Kim S, Myers R and Cox D. (1990) Exon trapping: a genetic screen to idenfiy candidate transcribed sequences in cloned mammalian genomic DNA. Proc. Natl. Acad. Sci. USA 87:8995-9 Ellis NA, Groden J, Ye T-Z, Straughen J, Lennon DJ, Ciocci S, Proytcheva M and German J. (1995) The Bloom's syndrome gene product is homologous to RecQ helicases. Cell 83:655-666. Ellis NA and German J. (1996) Molecular genetics of Bloom's syndrome. Hum.Mole.Genet. 5:1457-1463 Emons G, Muller V, Ortmann O and Schulz KD. (1998) Effects of LHRH-analogues on mitogenic signal transduction in cancer cells. J. Steroid Biochem Mol. Biol 65(1-6): 199-206 Epstein CJ, Martin GM, Schultz AL, Motulsky A. (1966) Werner syndrome. Medicine 45:177-221. Epstein HF and Shakes DC Eds. (1995) Caenorhabditis elegans : modern biological analysis of an Organism. Academic Press, San Diego Etzold T, Ulyanov A, and Argos P. (1996) SRS: Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266:114-128 Faragher RGA, Kill IR, Hunter JAA, Pope FM, Tannock C, Shall S. (1993) The gene responsible for Werner syndrome may be a cell division "counting" gene. Proc. Natl. Acad. Sci. USA 90:12030-12034. 189 Feinberg A and Vogelstein B. (1984) A technique for radiolablling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem. 132:6-13 Finch, CE. (1991) Longevity, Senescence, and the Genome. University of Chicago Press. Fire A, Albertson D, Harrison SW and Moerman DG (1991) Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegans muscle. Development 113: 503-514 Friedberg EC, Walker GC and Siede W. (1995) DNA repair and mutagenesis. ASM Press. Fujiwara Y, Higashikawa T, Tatsumi M. (1977) A retarded rate of DNA replication and normal level of DNA repair in Werner syndrome fibroblasts in culture. J Cell Physiol 92:365-374. Fukuchi K, Tanaka K, Nakura J, Kumahara Y, Uchida T, Okada Y. (1985) Elevated spontaneous mutation rate in SV40-transformed Werner syndrome fibroblast cell lines. Somat Cell Mol Genet 11:303-108. Fukuchi K, Martin GM, Monnat RJ. (1989) Mutator phenotype of Werner syndrome is characterized by extensive deletions. Proc. Natl. Acad. Sci. USA 86:5893-5897. Fukuchi K, Tanaka K, Kumahara Y, Marumo K, Pride MB, Martin GM, Monnat RJ (1990) Increased frequency of 6-thioguanine resistant peripheral blood lymphocytes in Werner syndrome patients. Human Genet 84:249-252. Gangloff S, McDonald JP, Bendixen C, Arthur L, Rothstein R. (1994) The yeast type I topoisomerase Top3 interacts with Sgsl, a DNA helicase homolog: a potential eukaryotic reverse gyrase. Molcul.Cell Biol. 14:8391-8398. Gebhart E, Schinzel M, Ruprecht KW. (1985) Cytogenetic studies using various clastrogens in two patients with Werner syndrome and control individuals. Human Genet 70:324-327. Gebhart E, Bauer R, Raub U, Schinzel M, Ruprecht KW, Jonas JB. (1988) Spontaneous and induced chromosomal instability in Werner syndrome. Human Genet 80:135-139. Gibbons RJ, Picketts DJ, Villard L and Higgs DR. (1995) Mutations in a putative global transcriptional regulator cause X-linked mental retardation with alpha-thalassemia (ATR-X syndrome). Cell 80:837-845 Goddard KAB, Yu CE, Oshima J, Miki T, Nakura J, Puissan C, Martin GM, Schellenberg GD, Wijsman EM and members of the International Werner syndrome Collaborative Group. (1996) Toward Localization of the Werner syndrome Gene by Linkage Disequilibrium and Ancestral Haplotyping: Lessions Learned from Analysis of 35 Chromosome 8pll.l-21.1 Markers. Am. J. Hum. Genet. 58:1286-1302 Gorbalenya AE and Koonin EV. (1993) Helicases: amino acid sequence comparisions and structure-function relationships. Curr. Opin. Struct. Biol. 3:419-429 Goto M, Rubenstein M, Weber J, Woods K and Drayna D. (1992) Genetic linkage of Werner syndrome to five markers on chromosome 8. Nature 355:735-8 Gray MD, Shen JC, Kamath-Loeb AS, Blank A, Sopher BL, Martin GM, Oshima J. and Loeb LA. (1997) The Werner syndrome protein is a DNA helicase. Nature Genetics 17:100-103 190 Gray MD, Wang L, Youssoufian H, Martin GM and Oshima J. (1998) Werner helicase is localized to transcriptionally active nucleoli of cycling cells. Exp. Cell Res. 242(2):487-494 Green ED and Olsen MV (1990). Systemic screening of yeast artificial chromosome libraries by use of the plymerization chain reaction. Proc. Natl. Acad. Sci. USA 87:1213-1217 Grunstein M and Hogness DS (1975). Colony hybridization: A method for the isolation of cloned DNA's that contain a specific gene. Proc. Natl. Acad. Sci. USA 72:3961-5 Gyapay G, Morissette J, Vignal A, Dib C, Fizames C, Millasseau P, Marc S, Bernardi G, Lathrop M and Weissenbach J. (1994) The 1993-94 Genethon human genetic linkage map. Nat Genet (2 Spec No):246-339 Hand R, German J. (1975) A retarded rate of DNA chain growth in Bloom's syndrome. Proc. Natl. Acad. Sci. USA 72:758-762. Harmon FG and Kowalczykowski SC. (1998) RecQ helicase, in concert with RecA and SSB proteins, initiates and disrupts DNA recombination. Genes Develop. 12(8):1134-1144 Harris N, Dutlow C, Eidne K, Dong KW, Roberts J and Millar R. (1991) Gonadotropin-releasing hormone gene expression in MDA-MB-231 and ZR-75-1 breast carcinoma cell lines. Cancer Res. 51:2577-2581 Hayflick, L. (1981) The biology of human aging. Plastic Recons. Surg. 67:536-550 Henikoff S, Endow SA and Greene, EA (1996) Connecting protein family resources using the pro Web network. TIBS 21:444-445 Henikoff S and Henikoff JG. (1994) Protein family classification based on searching a database of blocks. Genomics 19:97-107 Henikoff S, Henikoff JG, Alford WJ and Pietrokovski S. (1995) Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene-COMBIS, Gene 163 GC 17-26 Hochgeschwender U. (1992) Toward a transcriptional map of the human genome. Trends Genet. (2):41 Holliday R, Thompson KVA, Huschtscha LI, Rattan SIS, Sedgwick SG, Spanos A. (1985) Experimental studies on Werner syndrome fibroblasts, in Werner syndrome and human aging (Salk D, Fujiwara Y, Martin GM, eds) pp. 331-339. Plenum Press. Howell AM, Gilmour SG, Mancebo RA and Rose AM. (1987) Genetic analysis of a large autosomal region in C. elegans by the use of a free duplication. Genetical Research 49:207-213 Hudson T, Stein L, Gerety S, Ma J, Castle A, Silva J, Slonim D, Baptista R, Kruglyak L, Xu S-H, Hu X, Colbert AME, Rosenberg C, Reeve-Daly MP, Rozen S, Hui L, Wu X, Vestergaard C, Wislon KM, Bae JS, Maitra S, Ganiatas S, Evans, CA, DeAngelis MM, Ingalls KA, Nahf RW, Horton LT, Anderson MO, Collymore AJ, Ye W, Kouyoumjian V, Zemsteva IS, Tarn J, Devine R, Courtney DF, Renaud MT, Nguyen H, 0"onnor TJ, Fizames C, Faure S, Gyapay G, Dib C, Morissette J, Orlin JB, Birren BW, Goodman, Weissenbach J, Hawkins TL, Goote S, Page DC, and Lander ES. (1995) An STS-based map of the human genome. Science 270:1945-1954 191 Imai A, Takagi A, Horibe S, Takagi H and Tamaya T. (1998) Fas and Fas ligand system may mediate antiproliferative activity of gonadotropin-releasing hormone receptro in endometrial cancer cells. Int. J. Oncol. 13(1):97-100 Imbert A, Chaffanet M, Essioux L, Noguchi T, Adelaide J, Kerangueven F, Le Paslier D, Bonaiti-Pellie C, Sobol H, Birnbaum D and Pebusque MJ. (1996) Integrated Map of the Chromosome 8pl2-p21 Region, a Region Involved in Human Cancers and Werner syndrome. Genomics 32:29-38 Imura H, Nakao Y, Kuzuya H, Okamoto M, Okamoto M and Yamada K. (1985) Clinical, Endocrine and Metabolic Aspects of the Werner syndrome Compared with those of Normal Aging, in Werner syndrome and Human Aging (Salk D, Fujiwara Y, Martin GM, eds) pp.305-312. Plenum Press. Inamura O, Ichikawa K, Yamabe Y, Goto M, Sugawara M and Furuichi Y. 1997) Cloning of a Mouse Homologue of the Human Werner Syndrome Gene and Assignment to 8A4 by Fluorescence in situ Hybridization. Genomic 41:298-300 Irmer G, Burger C, Ortmann O, Schulz KD and Emons G. (1994) Expression of the luteinizing hormone-releasing hormone (LHRH) and its mRNA in human endometrial cancer cell lines. J. Clin. Endocrin. Metab.. 79:916-919 Irmer G, Burger C, Muller R, Ortmann O, Peter U, Kakar SS, Neill JD, Schulz KD and Emons G. (1995) Expression of the messenger RNAs for luteinizing hormone-releasing hormone (LHRH) and its receptor in human ovarian epithelial carcinoma. Cancer Res. 55:817-822 Janke DL, Schein JE, Ha T, Franz NW, O'Neil NJ, Vatcher GP, Stewart HI, Kuervers LM, Baillie DL and Rose AM. (1997) Interpreting a Sequenced Genome: Toward a Cosmid Transgenic Library of Caenorhabditis elegans. Genome Research 7(10): 974-985 Jansen G., Hazendonk E., Thijssen, KL, and Plasterk, RHA. (1997) Reverse genetics by chemical mutagenesis in C. elegans. Nature Genetics 17: 119-121. Johnson TE and Hutchinson EW (1990) Aging in C elegans: Update 1988. Review of Biological Research in Aging 4:15-27 Jones RM, MacDonald ME, Branda J, Altherr MR, Louis DN and Schmidt EV. (1997) Assignment of the human gene encoding eukaryotic initiation factor 4E (eIF4E) to the region q21-25 on chromosome 4. Somat. Cell Molec. Genet. 23:221-223 Karlin S and Altschul SF. (1990) Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes. Proc. Natl. Acad. Sci. USA 87:2264-2268 (1990) Keranguenven F, Essioux L, Dib A, Noguchi T, Allione F, Geneix J, Longy M, Lidereau R, Eisinger F, Pebusque MJ, Jacquemier J, Bonaiti-Pellie C, Sobol H and Birnbaum D. (1995) Loss of heterozygosity and linkage analysis in breast carcinoma: an indication for a putative third susceptibility gene on the short arm of chromosome 8. Oncogene 10:1023-1026 Khew-Goodall Y, Mayer R, Maurer F, Sone S and Hemmings B. (1991) Structure and transcriptional regulation of protein phosphatase 2A catalytic subunit genes. Biochemistry 30:89-97 192 Kieras F, Brown W, Houck G and Zebrower M. (1986) Elevation of hyaluronic acid in Werner syndrome and progeria. Biochem. Med. Metab. Biol. 36:276-282. Kirchgessner C, Patil C, Evans J, Cuomo C, Fried L, Carter T, Oettinger M and Brown J. (1995) DNA-Dependent Kinase p350. as a candidate gene for murine SCID defect. Science 267:1178-1183 Klass MR (1977) Aging in the nematode C. elegans'. Major biological and environmental factors influencing life span. Mechanisms of Ageing & Development 6: 413-429 1977 Kleene SC (1956) Representation of events in nerve nets and finite automata. In Automation Studies (Shannon and McCarthy, Eds.) Princeton University Press, pp. 3-40 Korolev S, Yao N, Lohman T, Weber P and Waksman G. (1998) Comparisons between the structures of HCV and Rep helicases reveal structural similarities between SF1 and SF2 super-families of helicases. Protein Science 7:605-610 Krogh A, Brown M, Mian I, Sjolander K and Haussler D. (1994) Hidden markov models in computational biology: Applications to protein modelling. J.Mol.Biol. 235:1501-31 Kruk PA, Rampino NJ, Bohr VA. (1995) DNA damage and repair in telomeres: relation to aging. Proc. Natl. Acad. Sci. USA 92:258-262. Lander E and Green P (1987) Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA. 84: 2363-2367 Lathrop G, Lalouel J, Julier C and Ott J. (1984) Strategies for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci. USA 81:3443-6 Lathrop G, Lalouel J, Julier C and Ott J. (1985) Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am.J.Hum. Genet. 37:482-98 Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, and Wootton JC (1993). Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262:208-214 Lewis JA and Fleming JT. (1995) Basic Culture Methods in Chapter 1, Methods in Cell Biology, Volume 48: Caenorhabditis elegans: Modern Biological Analysis of an Organism., Academic Press, SanDiego. Limonta P, Dondi D, Moretti RM, Fermo D, Garattini E and Motta M. (1993) Expression of luteinizing hormone-releasing hormone mRNA in the human prostatic cancer cell HneLNCaP. J. Clin. Endocrin. Metab. 76:797'-800 Lombard DB and Guarented L. (1996) Cloning the gene for Werner syndrome: a disease with many symptoms of premature aging. Trends Genet. 12(8):283-286 Lu J, Mullen JR, Brill SJ, Kleff S, Romeo AM and Sternglanz R. (1996) Human homologues of yeast helicase. Nature 383:678-679 Marciniak RA, Lombard DB, Johnson FB and Guarente L. (1998) Nucleolar localization of the Werner syndrome protein in human cells. Proc. Natl. Acad. Sci. USA 95(12):6887-92 Maruyama, Ichi. (1997; personal communication) The unc-13 gene violates "GT-AG" rule of pre-mRNA splicing. 1997 International Worm Meeting Abstract #389 193 Mason Aj, Hayflick JS, Zoeller RT, Young WS, Phillips HS, Nikolics K and Seeburg PH. (1986) A deletion truncating the gonadotropin-releasing hormone gene is responsible for hypogonadism in the "hpg" mouse. Science 234:1366-1371 Matson SW, Bean DW and George JW. (1994) DNA helicases: enzymes with essential roles in all aspects of DNA metabolism. Bioessays 16(1): 13-22. McDowall JS and Rose A (1997) Alignment of the genetic and physical maps in the dpy-5 bli-4 (I) region of C. elegans by serial cosmid rescue of lethal mutations. Molec. & Gen. Genet. 255: 78-95 McEntyre J. (1998) Linking up with Entrez. Trends Genet 14(l):39-40 Melaragno MI, Pagni D, Smith MAC. (1995) Cytogenetics of Werner syndrome lymphocyte cultures. Mech Ageing Dev 78:117. Mian IS. (1997) Comparative sequence analysis of ribonucleases HII, III, II, PH and D. Nucl. Acids Res. 25(16):3187-3195 Miller RA. (1995) Gerontology: The Study of Aging as the Study of Cancer, in Molecular Aspects of Aging Esser K and Martin GM. Editors. John Wiley & Sons Miller G, Fuchs R and Lai E. (1997) IMAGE cDNA Clones, UniGene Clustering and AceDB: An Integrated Resource for Expressed Sequence Information. Genome Res. 7:1027-1032 Monnat RJ, Hackmann AFM, Chiaverotti TA. (1992) Nucleotide sequence analysis of human hypoxanthine phosphoribosyltransferase gene deletions. Genomics \2>:111-1%1. Morozov V, Mushegian AR, Koonin EV and Bork P. (1997) A putative nucleic acid-binding domain in Bloom's and Werner's syndrome helicases. TIBS 22:417-418 Moulder G and Barstead R. (1997) Reverse Genetics: Isolating Deletions in PCR Screens of Mutagenized Populations. WWW URL: Muramatsu, T and Kincaid RL (1992) Molecular cloning and chromosomal mapping of the human gene for the testis-specific catalytic subunit of calmodulin-dependent protein phosphatase (calcineurin A). Biochem. Biophys. Res. Commun. 188: 265-271 Myers R, Larin Z and Maniatis T. (1985) Detection of single base substitutions by ribonuclease cleavage at mismatches in RNA:DNA duplexes. Science 230:1242-6 Nagaraja R. (1992). Chapter 1 in Techniques for the Analysis of Complex Genomes Academic Press, London Nakayama Y, Wondisford FE, Lash RW, Bale AE, Weintraub BD, Cutler GB and Radovick S. (1990) Analysis of Gonadotropin-Releasing Homrmone Gene Structure in Families with Familial Central Precosious Puberty and Idiopathic Hypogonadotropic Hypogonadism./. Clin. Endocrinol. Metabol. 70(5): 1233-1238. Nakura J, Miki T, Kamino K, Wijsman EMN, Yu C, Oshima J, Fukuchi K, Weber JL. Piussan C. Melaragno al. (1994) Homozygosity mapping of the Werner syndrome locus. Genomics 23(3):600-608 Nakura J, Miki T, Ye L, Mitsuda N, Zhao Y, Kihara K, Yu CE, Oshima J, Fukuchi KI, Wijsman EM, Schellenberg GD, Martin GM, Murano Si, Hashimoto K, Fujiwara Y, Ogihara T 194 (1996) Narrowing the position of the Werner syndrome locus by homozygosity analysis-extension of homozygosity analysis. Genomics 36(1): 130-41 Needleman SB and Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol.Biol. 48:443-453 Nelson D, Ledbetter S, Corbo L, Victoria M, Ramirez-Solis R, Webster TD, Ledbetter DH and Caskey CT. (1989) Alu polymerase chain reaction: a method for rapid isolation of human-specific sequences from complex DNA sources. Proc. Natl. Acad. Sci. USA 86:6686-6690 Nikaido O, Nishida T, Shima A. (1985) Cellular mechanisms of aging in the Werner syndrome. Adv Exp Med Biol 190:421-438. Olson M, Hood L, Cantor C and Botstein D. (1989) A common language for physical mapping ofthe human genome. Science 245:1434-5 Orita M, Suzuki Y, Sekiya T and Hayashi K. (1989) Rapid and sensitive detection of point mutations and DNA polymorphisms using the polymerase chain reaction. Genomics 5:874-9 Oshima J, Yu C, Boehnke M, Weber J, Edelhoff S, Wagner M, Wells E, Wood S, Disteche C, Martin G and Schellenberg G. (1994) Integrated mapping analysis of the Werner syndrome region of chromosome 8. Genomics 23:100-113 Oshima J, Yu CE, Puissan C, Klein G, Jabkowski J, Balci S, Miki T, Nakura J, Ogihara T, Ells J, Smith M, Melaragno MI, Fraccaro M, Scappaticci S, Matthews J, Ouais S, Jarzebowicz A., Schellenberg GD and Martin GM. (1996) Homozygous and compound heterozygous mutations at the Werner syndrome locus. Hum. Mol. Gen. 5:1909-1913 Ott J. (1985) Analysis of human genetic linkage. Johns Hopkins University Press Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2444-2448 Eisen J. (1998) Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by Evolutionary Analysis. Genome Research 8:13-167 Plasterk, RHA. (1992). Reverse genetics of Caenorhabditis elegans. BioEssays 14, 629-633. Plasterk, RHA., and Groenen, JTM. (1992). Targeted alterations of the Caenorhabditis elegans genome by transgene instructed DNA double strand break repair following Tel excision. EMBO J. 11, 287-290. Puranam KL and Blackshear PJ. (1994) Cloning and characterization of RECQL, a potential human homolog oi Escherichia coli DNA helicase RecQ. JBC 269(47) :2983 8-29845. Riddle D. (ed) (1997) C. elegans II: Cold Spring Harbor Laboratory Press, Plainview, N.Y. Rose MR and Archer MA. (1996) Genetic analysis of mechanisms of aging. Curr Op Gen Dev 6:366-370. Rowen L, Mahairas G and Hood L. (1996) Sequencing the Human Genome. Science 278 (5338):605 Runger TM, Bauer C, Dekant B, Moller K, Sobotta, Czerny C, Poot M, Martin GM. (1994) Hypermutable ligation of plasmid DNA ends in cells from patients with Werner syndrome. J Invest Dermatol 102:45-48. 195 Saitou N and Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406-425 Salk D, Au K, Hoehn H, Martin GM. (1981) Cytogenetics of Werner syndrome cultured skin fibroblasts: variegated translocation mosaicism. Cytogenet Cell Genet 30:92-107. Salk D, Bryant E, Hoehn H, Johnston P, Martin GM (1985) Growth characteristics of Wenrer syndrome cells in vitro, in Werner syndrome and human aging (Salk D, Fujiwara Y, Martin GM, eds) pp.305-312. Plenum Press. Sanger F, Nicklen S and Coulson AR. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. Sankoff D. (1972) Matching sequences under deletion-insertion constraints. Proc.Natl.Acad.Sci. USA, 68:4-6 Sankoff D. (1975) Minimal mutation trees of sequences. SI AM J. Appl. Math. 78:35-42 Sapru M, Gu J, Gu X, Smith D, Yu C, Wells D and Wagner M. 1994. A panel of radiation hybrids for human chromosome 8. Genomics 21:208-216 Schaeffer L, Roy R, Humbert S, Moncollin V, Vermeulen W, Hoeijmakers JHJ, Chambon P and Egly J-M. (1993) DNA repair helicase: a component of BTF2 (TFIIH) basic transcription factor. Science 260:58-63. Schaeffer L, Moncollin V, Roy R, Staub A, Mezzina M, Sarasin A, Weeda G, Hoeijmakers JHJ and Egly JM. (1994) The ERCC2/DNA repair protein is associated with the class II BTF/TFIIH transcription factor. EMBO J. 13:2388-2392. Schellenburg G, Martin G, Wijsman E, Nakura J, Miki T and Ogihara T. (1992) Homozygosity mapping and Werner syndrome. Lancet 339(8799): 1002 Schlessinger D. (1990) Yeast artificial chromosomes: tools for mapping and analysis of complex genomes. Trends Genet: 6:248-58 Schug J. and Overton GC. (1997) TESS: Transcription Element Search Software on the WWW Technical Report CBIL-TR-1997-100 l-v0.0, ofthe Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania. URL: Schuler GD, Altschul SF, and Lipman DJ (1991). A Workbench for Multiple Alignment Construction and Analysis. Prot. Struct. Func. Genet. 9:180-190 Schuler G. D., Boguski M. S.. Stewart E. A, Stein L. D., G. Gyapay, K. Rice, R. E. White, P. Rodnguez-Tome', A. Aggarwal, E. Bajorek, S. Bentolila, B. B. Birren, A. Butler, A. B. Castle, N. Chiannilkulchai, A. Chu, C. Glee, S. Cowles, P. J. R. Day, I. Dibling, N. Drouot, I. Dunham, S. Duprat, C. East, C. Edwards, J.-B. Fan, N. Fang, C. Fizames, C. Garrett, L. Green, D. Hadley, M. Harris, P. Harrison, S. Brady, A. Hicks, E. Holloway, L. Hui, S. Hussain, C. Louis-Dit-Sully, J. Ma, A. MacGilvery, C. Mader, A. Maratukulam, T. C. Matise, K. B. McKusick, J. Morissette, A. Mungall, D. Muselet, H. C. Nusbaum, D. C. Page, A. Peck, S. Perkins, M. Piercy, F. Qin, J. Quackenbush, S. Ranby, T. Reif, S. Rozen, C. Sanders, X. She, J. Silva, D. K. Slonim, C. Soderlund, W.-L. Sun, P. Tabar, I. Thangarajah, N. Vega-Czarny, D. Volirath, S. Voyticky, T. Wilmer, X. Wu, M. D. Adams, C. Auffray, N. A. R. Walter, R. Brandon, A. Dehejia, P. N. Goodfellow, R. Houlgatte, J. R. Hudson Jr., S. E. Ide, K. R. lono, W. Y. Lee, N. Seki, T Nagase, K. Ishikawa, N. Nomura, C. Phillips, M. H. Polymeropoulos, M. Sandusky, K. Schmitt, R. 196 Berry, K. Swanson, R. Torres, J. C. Venter, J. M. Sikela, J. S. Beckmann, J. Weissenbach, R. M. Myers, D. R. Cox, James M. H., Bentley D., Deloukas P., Lander E. S., and Hudson TJ. A Gene Map of the Human Genome. Science 274:540-546 Schulz VP, Zakian VA. (1994) The Saccharomyces PIF1 DNA helicase inhibits telomere elongation and de novo telomere formation. Cell 76:145-155. Schwartz D and Cantor C. (1984) Preparation of yeast chromosome-sized DNAs by pulse field gel electrophoresis. Cell 37:67-75 Seeburg PH and Adelman JP. (1984) Characterization of cDNA for precursor of human luteinizing hormone releasing hormone. Nature 311:666-668 Sellers P. (1974) On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26:787-793 Shen JC, Gray MD, Oshima J and Loeb LA. (1998) Characterization of Werner syndrome protein DNA helicase activity: directionality, substrate dependence and stimulation by replication protein A. Nucl. Acids Res. 26(12):2879-85 Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y and Simon M. (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl. Acad. Sci. USA 89(18):8794-7 Sinclair DA, Mills K and Guarente L. (1997) Accelerated Aging and Nucleolar Fragmentation in Yeast sgsl Mutants. Science 277:1313-1316 Sjolander K, Karplus K, Brown MP, Hughey R, Krogh A, Mian IS and Haussler D. (1996) Dirichlet mixtures: A method for improving detection of weak but significant protein sequence homology. CABIOS 12(4):327-345 Sjolander K. (1998a) Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains. ISMB-98. Sjolander K. (1998b) Bayesian Evolutionary Tree Estimation. Math. Model. Scient. Comp. (in press) Smith TF and Waterman MS. (1981) The identification of common molecular subsequences. J. Mol. Biol. 147:195-197 Soderlund C. and Dunham I. (1995) SAM: a system for iteratively building marker maps. 045/0511:645-655. Sonnhammer EL and Durbin R. (1997) Analysis of protein domain families in Caenorhabditis elegans. Genomics 46(2) :200-16 Sonnhammer ELL, Eddy SR and Durbin R. (1997) Pfam: A Comprehensive Database of Protein Domain Families Based on Seed Alignments. Proteins 28:405-420. Southern E. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J.Mol.Biol. 98:503-17 Spieth J, Brooke G, Kuersten S, Lea K and Blumenthal T. (1993). Operons in C. elegans'. Polycistronic mRNA precursors are processed by /rans-splicing of SL2 to downstream coding regions. Cell 73:521-532 197 Staden R. (1979) A strategy of D N A sequencing employing computer programs. Nucl. Acids Res. 6:2601-2610 Sternberg N. (1990) Bacteriophage PI cloning system for the isolation, amplification and recovery of D N A fragments as large as 100 kilobase pairs. Proc. Natl. Acad. Sci. USA 87(1):103-107 Stefanini M, Scappaticci S, Lagomarsini P, Borroni G, Berardesca E, Nuzzo F (1989) Chromosome instability in lymphocytes from a patient with Werner syndrome is not associated with D N A repair defects. Mutat Res 219:179-185. Stewart E. Chapman CR, Al-Khodairy F, Carr AM, Enoch T. (1997) rqhl+ , a fission yeast gene related to the Blooom's and Werner syndrome genes, is required for reversible S-phase arrest. EMBO J.16(10):2682-2692 Sulston JE and Brenner S. (1974) The D N A of Caenorhabditis elegans. Genetics 77:95-104 Sung P, Bailly V, Weber C, Thompson LH, Prakash L and Prakash S. (1993) H u m a n xeroderma pigmentosum group D gene encodes a D N A helicase. Nature 365: 852-855 Suzuki D, Griffiths A, Miller J and Lewontin R. (1989) Chapter 1 in An Introduction to Genetic Analysis, 4th Edition, W.H. Freeman and Company, NY Suzuki N, Shimamoto A, Imamura O, Kuromitsu J, Kitao S, Goto M and Furuichi Y. (1997) D N A helicase activity in Werner's syndrome gene product synthesized in a baculovirus system. Nucl. Acids Res. 25(15):2973-2978 Szende B, Srkalovic G, Timar J, Mulchahey JJ, Neill JD, Lapis K, Csikos A, Szepeshazi K and Schally AV. (1991) Localization of receptors for luteinizing hormone releasing hormone in pancreatic and mammary cancer cells. Proc. Natl. Acad. Sci. USA 88:4153-4156 Takeuchi F, Hanaoka F, Goto M, Akaoka I, Hori T, Yamada M, Miyamoto T. (1982) Altered frequency of initiation sites of D N A replication in Werner syndrome cells. Human Genet 60:365-358. Tanaka K, Nakazawa T, Okada Y, Kumahara Y (1979) Increase in D N A synthesis in Werner syndrome cells by hybridization with normal human diploid and H e L a cells. Exp Cell Res 123:261-167. Tatusov RL, Koonin EV and Lipman DJ. (1997) A Genomic Perspective on Protein Families Science 278: 631-637. Thierry-Mieg J and Durbin R. (1990-98) A C E D B See URL: Thomas W. Rubenstein M. Goto M. Drayna D. (1993) A genetic analysis of the Werner syndrome region on human chromosome 8p. Genomics. 16(3):685-90. Thompson, JD, Higgins, DG and Gibson, TJ. (1994) C L U S T A L W : improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22:4673-4680 Timmons L and Fire A. (1998) Specific interference by ingested d s R N A . Nature 395:854 198 Troelstra C, van Gool A, de Wit J, Vermeulen W, Bootsma D and Hoeijmakers JHJ. (1992) ERCC6, a member of a subfamily of putative helicases, is involved in Cockayne's syndrome and preferential repair of active genes. Cell 71:939-953 Tuteja N, Tuteja R, Ochem A, Taneja P, Huang NW, Simocsits A, Susie S, Rahman K, Marusic L, Chen J, Zhang J, Wang S, Pongor S and Falaschi A. (1994) DNA unwinding enzyme identified as the Ku autoantigen. EMBO J 13(20):4991-5001. Tuteja N, Huang NW, Skopac D, Tuteja R, Hrvatic S, Zhang J, Pongor S, Joseph G, Faucher C, Amalric F and Falaschi A. (1995) Human DNA helicase IV is nucleolin, an RNA helicase modulated by phosphorylation. Gene 160:143-148. Umezu K, Nakayama K, Nakayama H. (1990) Escherichia coli RecQ protein is a DNA helicase. Proc. Natl. Acad. Sci. USA 87:5363-5367. Villard L, Gecz J, Mattei JF, Fontes M, Saugier-Veber P, Munnich A and Lyonnet S. (1996) XNP mutation in a large family with Juberg-Marsidi syndrome. Nature Genet. 12:359-360 Wagner M, Ge Y, Siciliano M and Wells DE. 1991. A hybrid cell mapping panel for regional localization of probes to human chromosome 8. Genomics. 10(1): 114-125 Wang L, Hunt KE, Martin GM and Oshima J. (1998) Structure and function of the human Werner syndrome gene promoter: evidence for transcriptional modulation. Nucl. Acids Res. 26(15):3480-3485 Waterman MS, Smith TF and Beyer WA. (1976) Some biological sequence metrics. Adv. in Math. 20:367-387 Watt PM, Louis EJ, Boris RH, Hickson ID. (1995) Sgsl: a eukaryotic homolog of E. coli RecQ that interacts with topoisomerase II in vivo and is required for faithful chromosome segregation. Cell 81:253-260. Watt PM, Hickson ID, Boris RH, Louis EJ. (1996) SGS1, a homolog of the Bloom's and Werner syndrome genes, is required for maintenance of genomic stability in Saccharomyces cerevisiae. Genetics 144:935-945. Weeda G, van Ham RCA, Vermeulen W, Bootsma D, van der Eb AJ and Hoeijmakers JHJ. (1990) A presumed DNA helicase encoded by ERCC-3 is involved in the human repair disorders xeroderma pigmentosum and Cockayne's syndrome. Cell 62: 777-791 Weissenbach J, Gyapay G, Dib C, Vignal A, Morissette J, Millasseau P, Vaysseix G, Lathrop M. (1992) A second-generation linkage map of the human genome. Nature 359(6398):794-801 Webb DK, Evans MK, Bohr VA. (1996) DNA repair fine structure in Werner syndrome cell lines. Exp Cell Res 224:272-278. Weber JL and May PE. (1989) Abundant class of human DNA polymorphism which can be typed using the polymerase chain reaction. Am.J.Hum.Genet. 44:388-96 Wicking C and Williamson B. (1991) From Linked Marker To Gene. Trends Genet. 7:288-93 Wilbur WJ and Lipman DJ. (1983) Rapid similarity searches of nucleic acid and protein data banks.Proc Natl Acad Sci USA 80(3):726-30 199 Wilson, R., Ainscough, R., Anderson, K., Baynes, C, Berks, M., Bonfield, J., Burton, J., Connell, M., Copsey, T., Cooper, J., Coulson, A., Craxton, M, Dear, S., Du, Z., Durbin, R., Favello, A., Fraser, A., Fulton, L., Gardner, A., Green, P., Hawkins, T., Hillier, L., Jier, M., Johnston, L., Jones, M., Kershaw, J., Kirsten, J., Laisster, N., Latreille, P., Lightning, J., Lloyd, C, Mortimore, B., O'Callaghan, M., Parsons, J., Percy, C, Rifken, L., Roopra, A., Saunders, D., Shownkeen, R., Sims, M., Smaldon, N., Smith, A., Smith, M., Sonnhammer, E., Staden, R., Sulston, J., Thierry- Mieg, J., Thomas, K., Vaudin, M., Vaughan, K., Waterston, R., Watson, A., Weinstock, L., Wilkinson-Sproat, J., and Wohldman, P. (1994). 2.2 Mb of contiguous nucleotide sequence from chromosome I I I of C. elegans. Nature 368, 32-38. Wood S, Schertzer M, Drabkin H. Petterson D, Longmire J and Deaven L. 1992. Characterization of a human chromosome 8 cosmid library constructed from flow-sorted chromosomes. Cytogenet. Cell. Genet. 59:2M>-2\1 Yamagata K, Kato J, Shimamotot A, Goto M, Furuichi Y and Ikeda H. (1998) Bloom's and Werner's syndrome genes suppress hyperrecombination in yeast sgsl mutant: implication for genomic instability in human diseases. Proc. Natl. Acad. Sci. USA 95(15):8733-8738 Yan H, Checn CY, Kobayashi R and Newport J. (1998) Replication focus-forming activity 1 and the Werner syndrome gene product. Nature Genet. 19:375-378 Yang-Feng TL, Seeburg PH and Francke U. (1986) Human luteinizing hormone-releasing hormone gene (LHRH) is located on short arm of chromosome 8 (region 8pll.2-p21). Somat. Cell. Molec. Genet. 12:95-100. Ye L, Nakura J, Mitsuda N, Fujioka Y, Kamino K, Ohta T, Jinno Y, Niikawa N, Miki T, Ogihara T. (1995) Genetic Association Between Chromosome 8 Microsatellite MS8-134. and Werner syndrome WRN - Chromosome Microdissection And Homozygosity Mapping. Genomics. 283:566-569 Yin H, Cheng KW, Hwa HL, Peng C, Auersperg N and Leung PC. (1998) Expression of the messenger RNA for gonadotropin-releasing hormone and its receptor in human cancer cell lines. Life Sci 62(22):2015-23 Yu C-E, Oshima J, Fu Y-H, Goddard KAB, Miki T, Nakura J, Ogihara T, Poot M, Hoehn H, Fraccaro M, Piussan C, Martin GM, Schellenberg GD and Wijsman EM. (1994) Linkage Disequilibrium and Haplotype Studies of Chromosome 8p 11.1-21.1 Markers and Werner syndrome. Am. J. Hum. Genet. 55:356-364. Yu C-E, Oshima J, Fu Y-H, Wijsman EM, Hisama F, Alisch R, Matthews S, Nakura J, Miki T, Ouais S, Martin GM, Mulligan J, Schellenberg GD. (1996a) Positional cloning of the Werner syndrome gene. Science 272:258-262. Yu C-E, Oshima J, Hisama F, Matthews S, Trask BJ and Schellenberg GD. (1996b) A YAC, P I , and Cosmid Contig and 17 New Polymorphic Markers for the Werner syndrome Region at 8pl2-p21. Genomics 35:431-440 Yu CE, Oshima J, Wijsman EM, Nakura J, Miki T, Puissan C, Matthews S, Fu YH, Mulligan J, Martin GM and Schellenberg GD. (1997) Mutations in the consensus helicase domains of the Werner syndrome gene. Am. J. Hum. Genet. 60(2):330-41 200 APPENDICES Appendices A. Published Wagner/Sapru Hybrid Panel Map C H R O M O S O M E 8 RADIATION HYBRID P A N E L 211 1D1 IE1 2A6 2B1 2D1 2F4 2N2 2Q1 2R1 2T1 2V1 I I F I G . 1, STS content of selected radiation hybrids. For each cell line, the presence of single or multiple adjacent sequence tagged sites is indicated by a vertical bar. The standard cytogenetic banding pattern for chromosome 8 is shown at the left* and the loci tested are placed within the intervals A through I defined by the somatic cell hybrid mapping panel of Wagner et al. (1991) and ordered within those intervals as described in the text. r^TQm ^ ^ 201 B. CEPH Reference Family Pedigrees (Courtesy of the Coriell Cell Repository W W W site: CEPH/VenezueIan Pedigree 102 01 n-4477 02 -o 4479 03 6 »a re6 6 1 0 6 1 1 d 1 2 6 1 3 6 1 4 6 1 5LT 1 6b 4344 4fl 48 4690 5542 5376 4330 4336 4853 4323 5536 4290 43S2 4854 CEPH/Amish Pedigree 884 13 D -13111 17 01 • -O L> 13112 1311S 16 • 02 -o 13118 6 13113 13114' 03 04 06 06 07 03 09 10 11 12 13 14 O. O O • • • O, O D D O O 13117 131,18.13119 13120 13121 13122 13123 13124 13125 13126 13127 13128 202 B. CEPH Reference Family Pedigrees (cont'd) CEPH/Utah Pedigree 1331 01 HO* 7340 14 Q-15 7016 CQ 70S?; •o 7090 ra6 °tt <*6 "6 "D , C D 11rj 166 1 7 i 7023 7033 7059 7005 6999 69 &9 69 B3 7030 6992 11B1B 11B18 CEPH/Utah Pedigree 1332 •••• mm 14 01 •o 12097 IOB4B;: 03. 'O u / o 12039 12OS0 13130 12091 12092 12093 16 02 •o 12093 10B49 10. O • , s t ] 1 • 12237 12094 12C95 12100 203 B. CEPH Reference Family Pedigrees (cont'd) CEPH/Utah Pedigree 1347 12 Or-11B79 01 13 - o 11&50, CP-10B5B 14 15 11BS1 02 -o 11682-10659;: O B ^ 0 5 ^ 0 6 ^ 0 7 ^ 0 3 ^ 0 9 ^ 1 0 ^ 1 1 ^ 16 11B70 11B71 11B72 11B73 11875 11B76 11B77 11B7B 11BS3;; CEPH/Utah Pedigree 1362 13 D -14. 15 '1,1992^ 11993 -o .D-11994 16 01 10BG0 m-02 ,;:i;i995S: -6 10661 03 ^ 04 ^ 05 06 ^ 07 ^ OB J-J-J 09 ^ 10 ^ 11 12 ^ 17 Q 11982 119B3 119B4 119B5 119BS 11987 11988 11989 11980 11991 11990 204 B. CEPH Reference Family Pedigrees (cont'd) CEPH/Utah Pedigree 1413 • — i — o 121 ie 01 0-10B32 19 02 -o 12117 -6 10833^ 03, • W D T ] "O T I 10t] 1 20 , d D 'TTJ , o n 1 D0 12101 12102 12103 12104.12105 12106 12107 12108 12109 12110,12111 12112" 12113 12114-12115 13r -15r CEPH/Utah Pedigree 1416 1:l^ ::. 12248 01 32i si 2249a Lth-10B35 13 14-;122S0 02 -o 12251 6' 10834 03 04 06 07 OS 09 10 15 16 • D O D O D O O O O 12240 12241 12243 12244 12245 12246 12247 12252 12253 205 C. CEPH Reference Family Genotypes for STRPs Characterized in this Thesis 1. LHRH No. CEPH No. Position Allele A Allele B Identity Length Identity Length 1 102-01 FA 7 233 13 221 2 102-02 MO 7 233 30 187 3 102-03 CH 7 233 30 187 4 102-04 CH 7 233 30 187 5 102-05 CH 7 233 30 187 6 102-07 CH 7 233 13 221 7 102-08 CH 13 221 30 187 8 102-09 CH 7 233 7 233 9 102-10 CH 13 221 - 30 187 10 102-11 CH 13 221 30 187 11 102-12 CH 7 233 7 233 12 102-13 CH 7 233 7 233 13 102-14 CH 7 233 13 221 14 102-15 CH 7 233 13 221 15 102-16 CH 13 221 30 187 16 884-01 FA 5 237 6 235 17 884-02 MO 4 239 12 223 18 884-03 CH 5 237 12 223 19 884-04 CH 4 239 5 237 20 884-06 CH 6 235 12 223 21 884-07 CH 4 239 5 237 22 884-08 CH 5 237 12 223 23 884-09 CH 6 235 12 223 24 884-10 CH 4 239 6 235 25 884-11 CH 4 239 5 237 26 884-12 CH 4 239 5 237 27 884-13 CH 4 239 6 235 28 884-14 CH 4 239 5 237 29 884-16 FM 6 235 14 219 30 884-17 MF 4 239 6 235 31 884-18 MM 6 235 12 223 32 1332-01 FA 4 239 4 239 33 1332-02 MO 3 241 30 187 34 1332-04 CH 3 241 4 239 35 1332-05 CH 3 241 4 239 36 1332-06 CH 4 239 30 187 37 1332-08 CH 3 241 4 239 38 1332-10 CH 3 241 4 239 39 1332-11 CH 3 241 4 239 40 1332-12 CH 4 239 30 187 41 1332-13 FF 4 239 9 229 42 1332-14 FM 4 239 9 229 43 1332-15 MF 3 241 5 237 44 1332-16 MM 13 221 30 187 45 1332-17 CH 3 241 4 239 206 1. LHRH (cont'd) No. CEPH No. Position Allele A Allele B Identity Length Identity Length 46 1347-01 FA 4 239 13 221 47 1347-02 MO 2 243 5 237 48 1347-05 C H 5 237 13 221 49 1347-06 C H 2 243 4 239 50 1347-07 C H 5 237 13 221 51 1347-09 C H 2 243 4 239 52 1347-10 C H 5 237 13 221 53 1347-12 F F 8 231 13 221 54 1347-14 MF 2 243 12 223 55 1347-15 MM 5 237 13 221 56 1347-16 C H 5 237 13 221 57 1362-01 FA 13 221 13 221 58 1362-02 M O 6 235 30 187 59 1362-03 C H 13 221 30 187 60 1362-05 C H 6 235 13 221 61 1362-06 C H 6 235 13 221 62 1362-08 C H 6 235 13 221 63 1362-09 C H 6 235 13 221 64 1362-10 C H 6 235 13 221 65 1362-11 C H 6 235 13 221 66 1362-12 C H 6 235 13 221 67 1362-13 FF 7 233 13 221 68 1362-14 FM 7 233 13 221 69 1362-15 MF 13 221 30 187 70 1362-16 MM 6 235 14 219 71 1362-17 C H 6 235 13 221 72 1413-01 FA 5 237 6 235 73 1413-02 M O 13 221 13 221 74 1413-06 C H 6 235 13 221 75 1413-07 C H 5 237 13 221 76 1413-09 C H 5 237 13 221 77 1413-11 C H 5 237 13 221 78 1413-12 C H 5 237 13 221 79 1413-14 C H 6 235 13 221 80 1413-15 C H 6 235 13 221 81 1413-16 C H 6 235 13 221 82 1413-18 FM 5 237 7 233 83 1413-19 MM 5 237 13 221 84 1416-01 FA 9 229 13 221 85 1416-02 M O 5 237 13 221 86 1416-03 C H 13 221 13 221 87 1416-04 C H 5 237 13 221 88 1416-05 C H 13 221 13 221 89 1416-06 C H 5 237 13 221 90 1416-07 C H 5 237 9 229 91 1416-08 C H 13 221 13 221 92 1416-10 C H 13 221 13 221 93 1416-11 F F 7 233 9 229 94 1416-13 MF 13 221 13 221 95 1416-14 MM 5 237 7 233 96 1416-15 C H 9 229 13 221 207 1. LHRH (cont'd) No. CEPH No. Position Allele A Allele B Identity Length Identity Length 97 1331-01 F A 5 237 13 221 98 1331-02 M O 5 237 6 235 99 1331-03 C H 5 237 13 221 100 1331-04 C H 5 237 5 237 101 1331-05 C H 5 237 5 237 102 1331-06 C H 5 237 6 235 103 1331-07 C H 5 237 5 237 104 1331-08 C H 5 237 5 237 105 1331-09 C H 5 237 5 237 106 1331-10 C H 6 235 13 221 107 1331-11 C H 5 237 13 221 108 1331-13 FM 5 237 7 233 109 1331-14 MF 6 235 8 231 110 1331-15 MM 5 237 13 221 111 1331-16 C H 5 237 6 235 112 1331-17 C H 5 237 5 237 208 2. cos53C3PA No. CEPH No. Position Allele A Allele B Identity Length Identity Length 1 102-01 FA 5 88 6 86 2 102-02 MO 5 88 6 86 3 102-03 CH 5 88 6 86 4 102-04 CH 5 88 6 86 5 102-05 CH 5 88 6 86 6 102-07 CH 5 88 6 86 7 102-08 CH 5 88 5 88 8 102-09 CH 6 86 6 86 9 102-10 CH 5 88 5 88 10 102-11 CH 5 88 5 88 11 102-12 CH 6 86 6 86 12 102-13 CH 6 86 6 86 13 102-14 CH 5 88 6 86 14 102-15 CH 5 88 6 86 15 102-16 CH 5 88 5 88 16 884-01 FA 2 94 3 92 17 884-02 MO 4 90 4 90 18 884-03 CH 2 94 4 90 19 884-04 CH 2 94 4 90 20 884-06 CH 3 92 4 90 21 884-07 CH 2 94 4 90 22 884-08 CH 2 94 4 90 23 884-09 CH 3 92 4 90 24 884-10 CH 3 92 4 90 25 884-12 CH 2 94 4 90 26 884-13 CH 3 92 4 90 27 884-14 CH 2 94 4 90 28 884-15 FF 1 96 2 94 29 884-17 MF 4 90 4 90 30 884-18 MM 1 96 4 90 31 1332-01 FA 2 94 4 90 32 1332-02 MO 3 92 7 84 33 1332-04 CH 2 94 7 84 34 1332-05 CH 2 94 7 84 35 1332-08 CH 2 94 7 84 36 1332-10 CH 2 94 7 84 37 1332-11 CH 4 90 7 84 38 1332-12 CH 3 92 4 90 39 1332-14 FM 2 94 3 92 40 1332-15 MF 7 84 8 82 41 1332-16 MM 3 92 5 88 42 1332-17 CH 2 94 7 84 43 1347-01 FA 5 88 7 84 44 1347-02 MO 2 94 3 92 45 1347-05 CH 3 92 7 84 46 1347-06 CH 2 94 5 88 47 1347-07 CH 3 92 7 84 48 1347-09 CH 2 94 5 88 49 1347-10 CH 3 92 7 84 50 1347-12 FF 6 86 7 84 209 2. cos53C3PA (cont'd) No. CEPH No. Position Allele A Allele B Identity Length Identity Length 51 1347-14 MF 2 94 3 92 52 1347-15 MM 3 92 7 84 53 1347-16 CH 3 92 7 84 54 1362-01 FA 4 90 4 90 55 1362-02 MO 2 94 5 88 56 1362-03 CH 4 90 5 88 57 1362-05 CH 2 94 4 90 58 1362-06 CH 2 94 4 90 59 1362-08 CH 2 94 4 90 60 1362-09 CH 2 94 4 90 61 1362-10 CH 2 94 4 90 62 1362-11 CH 2 94 4 90 63 1362-12 CH 2 94 4 90 64 1362-13 FF 2 94 4 90 65 1362-14 FM 4 90 4 90 66 1362-15 MF 5 88 6 86 67 1362-16 MM 2 94 3 92 68 1362-17 CH 2 94 4 90 69 1413-01 FA 2 94 6 86 70 1413-02 MO 5 88 7 84 71 1413-06 CH 2 94 5 88 72 1413-07 CH 6 86 7 84 73 1413-09 CH 5 88 6 86 74 1413-12 CH 6 86 7 84 75 1413-14 CH 6 86 7 84 76 1413-15 CH 6 86 7 84 77 1413-16 CH 5 88 6 86 78 1413-18 FM 2 94 6 86 79 1413-19 MM 5 88 5 88 80 1416-01 FA 5 88 6 86 81 1416-02 MO 2 94 8 82 82 1416-03 CH 5 88 8 82 83 1416-04 CH 2 94 5 88 84 1416-05 CH 5 88 8 82 85 1416-06 CH 2 94 5 88 86 1416-07 CH 2 94 6 86 87 1416-08 CH 5 88 8 82 88 1416-10 CH 5 88 8 82 89 1416-11 FF 5 88 6 86 90 1416-13 MF 6 86 8 82 91 1416-14 MM 2 94 6 86 92 1416-15 CH 6 86 8 82 93 1416-16 CH 2 94 6 86 210 2. D8S2297 Allele A Allele B No. CEPH No. Position Identity Length Identity Length 1 102-01 FA 2 129 3 127 2 102-02 M O 1 131 7 119 3 102-03 C H 3 127 7 119 4 102-05 C H 3 127 7 119 5 102-07 C H 1 131 2 129 6 102-08 C H 2 129 7 119 7 102-09 C H 1 131 3 127 8 102-10 C H 2 129 7 119 9 102-11 C H 2 129 7 119 10 102-12 C H 1 131 3 127 11 102-13 C H 1 131 3 127 12 102-14 C H 1 131 2 129 13 102-15 C H 1 131 3 127 14 102-16 C H 2 129 7 119 15 884-01 FA 9 115 10 113 16 884-02 M O 9 115 10 113 17 884-04 C H 9 115 10 113 18 884-06 C H 9 115 10 113 19 884-07 C H 9 115 10 113 20 884-08 C H 9 115 9 115 21 884-10 C H 9 115 10 113 22 884-12 C H 9 115 10 113 23 884-13 C H 10 113 10 113 24 884-14 C H 9 115 9 115 25 884-15 FF 7 119 9 115 26 884-17 MF 7 119 10 113 27 1413-01 FA 7 119 7 119 28 1413-02 M O 4 125 11 111 29 1413-06 C H 4 125 7 119 30 1413-07 C H 7 119 11 111 31 1413-09 C H 4 125 7 119 32 1413-11 C H 7 119 11 111 33 1413-12 C H 7 119 11 111 34 1413-14 C H 4 125 7 119 35 1413-16 C H V 4 125 7 119 36 1413-18 F M 4 125 7 119 37 1413-19 MM 2 129 4 125 211 D. Functional Analysis Work Undertaken in Caenorhabditis elegans This Appendix summarizes work undertaken in C elegans during the thesis period which was not ultimately incorporated into the primary body of the thesis. Introduction: Functional Analysis in a Model System A reasonable next step subsequent to the positional cloning of WRN could be to functionally characterize WRN. This was the objective chosen for the second phase of this thesis project. One of the more powerful approaches to functional characterization of a human gene is to match the gene by sequence similarity to another simpler, more genetically or developmentally tractable, model system. Thus, a decision was taken to study the function of DExH helicases in one such model system, Caenorhabditis elegans. a) Features of the Caenorhabditis elegans Experimental System The nematode, C. elegans, is currently the focus of intense interest in zoology and genetics (Brenner, 1974; Epstein and Shakes, 1995; Riddle, 1997). The advantages of this organism as an experimental system include: • That nematodes are one of the simplest metazoan animals known, yet advanced enough to provide critical information about metazoan development, cell-cell interactions and structural anatomy of a variety of differentiated cell types including muscle, neuronal, hypodermis, intestinal, glandular and reproductive cells. The worm also has numerous gene families, genetic networks and cell signaling (including hormonal) pathways conserved in general way in higher organisms. • That the worm is easily cultivated on a lawn of bacteria growing on a Petri dish. Generation times are short and temperature dependent. Worms may be preserved by low temperature freezing, with good longevity and revival statistics. 212 • That most C. elegans worms are self-fertilizing XX hermaphrodites and may be maintained indefinitely, bred to homozygosity. Mutations may be similarly maintained in balanced heterozygote strains. Male worms appear spontaneously at low frequency, by X chromosome loss, thus permitting inter-strairi genetic crosses between worm strains. • That the nematode has a well-characterized deterministic developmental fate map and cell lineage of exactly 959 somatic nuclei in the adult worm. The worm's body is translucent and the orientation of the worm on Petri dishes is optimal for examination of internal organs. Cell ablation and genetic marking have completely elucidated the cell lineage to a large extent. • That the nematode genome is relatively small, and thus provides for relatively tractable genetic and physical mapping, including the aforementioned genome sequencing project, which was undertaken internationally and is now due to be completed by later, this year (Wilson etal, 1994). • That numerous established techniques for mutagenesis, reverse genetics and molecular analysis are now developed for this model system. The above biological features, the popularity of the nematode as an experimental system for senescence research (Klass, 1977; Johnson and Hutchinson, 1990), combined with the identification by the nematode sequencing project of several putative genetic loci homologous to the WRN gene, makes this model system an attractive one for functional analysis of the WRN gene family. In addition, significant research expertise and resources for C. elegans research are available at UBC. 213 b) The Isolation of a Phenotype from a Genotype Classical genetic analysis proceeds most often from the description of a (mutant) phenotype, towards the isolation of the underlying genotype (mutation), via an intermediate stage of genetic mapping, cloning and sequencing. With the advent of totally sequenced genomes, in particular, the complete or nearly complete sequences of several model prokaryote and eukaryote genomes, the strategy of going from a genotype (i.e. mutagenized sequence) to the phenotype -is becoming more central in genetics research (Plasterk, 1992). The basic objective of reverse genetics is to identify or generate the phenotype associated with a mutant genotype of a given known (sequenced) locus. The easiest approach, the quickest to undertake, but the one not generally guaranteed a priori to yield a mutant, is simply to screen known mutant strains on the genetic map spanning the sequenced locus of interest. In C. elegans, such experiments are greatly assisted by the phenomenal effort which has gone into collecting and storing mutant nematode strains at a central repository, the "Caenorhabditis elegans Genetics Centre" ("CGC") at the University of Minnesota. Adding to this resource is the bountiful genetic map data in the nematode research community compiled into a common database repository, the "A. Caenorhabditis elegans Database" (ACEDB). In this thesis project, such an initial screen was undertaken employing a strategy of "phenotype rescue" (McDowell and Rose, 1997). This experiment crossed a wild-type worm carrying a transgenic array of cosmid DNA (Janke et al, 1997) spanning a WRN homologous locus of interest, against one of a set of known mutant worms carrying mutations mapping to the genetic interval containing the cosmid (Bruskiewich, Rose and Wood, 1997). In the case of prokaryotes and yeast, homologous recombination proceeds with high efficiency, providing many opportunities for precisely targeted reverse genetics by recombinant 214 or site directed mutagenesis of loci. Unfortunately, homologous recombination has not yet been artificially inducible in C. elegans, thus alternative techniques for reverse genetics have been developed. These have included mutagenesis with chemical mutagens (Jansen et al., 1997); Tel transposon insertional mutagenesis (Plasterk and Groenen, 1992). and the generation of phenocopies using dsRNA interference (Timmons and Fire, 1998). Materials & Methods The second phase of this thesis project involved the functional analysis of WRN in the model metazoan, Caenorhabditis elegans. a) Maintenance of Worm Stocks Worms were maintained on NGM plates seeded with E. coli OP50 bacterial lawns. Incubation of worms was at either 16°C or 20°C over several a number of days congruent with the nature of a particular experiment. In most cases, plates were colonized with a single hermaphrodite, L4 larval stage worm transferred from a source plate using a (alcohol burner) flame sterilized platinum wire pick, guided by observations with a binocular dissecting microscope. Decontamination of plates when necessary was performed by rapid sequential transfer of single worms to fresh plates. Specific isolates of nematodes (e.g. transgenic strains, results of crosses) were generally stored in long term deep freeze storage using standard protocols. Briefly, a slightly overgrown plate ideally containing thousands of slightly starved LI and L2 larvae were collected with small aliquots of M9 buffer (Brenner, 1974) into sterile tube. An equal volume of freezing solution (Sulston and Brenner, 1974) was added prior to transferring the worms (resuspended) into four (4) marked 1.8 Nunc, cryogenic tubes. The capped tubes were placed into a foam freezing box and put at -70°C. After freezing, one of the tubes was immediately thawed and inoculated onto a 215 fresh plate to assess the freeze survival status of the worms. The remaining tubes were kept at -70°C or under liquid nitrogen until required. i) NGM Media 3 g of NaCl, 2.5 g of Bacto-peptone and 17 g of agar are autoclaved in 1 litre of water. Upon cooling of the media to 55°C, add the following in order, with swirling and sterile technique: 1 ml of cholesterol (5 mg/ml in EtOH), 1 ml 1M CaCl2, 1 ml 1M MgS04, and 25 ml of 1 M KH2P04 (pH 6.0). Pour petri 5 cm plates half full and flame surface of agar to remove air bubbles which permit nematodes to burrow into the agar. Seed with a streak of an overnight culture of OP50, incubate overnight and refrigerate plates until used at 4°C (Lewis and Fleming, 1995) ii) M9 Buffer Dissolve in dH20 and autoclave: 6gNa2HP04 3 g KH2P04 5 g NaCl 0.25 MgS04«H20 iii) Freezing Medium 20 ml of 1M NaCl 10 ml of 1M KH2P04 (pH 6.0) 60 ml of 100% glycerol Add dH20 to 200 ml. Autoclave. Add 0.6 ml of sterile 0.1 M MgS04 b) WRN Candidate Homologous Loci in C. elegans Candidate WRN homologous loci in the nematode were initially identified as BLAST hits upon the DNA (finished and unfinished) sequence and predicted protein ("Wormpep") databases of the nematode genome sequencing project (Wilson et al., 1994) underway at the Sanger Centre (Hinxton, UK) and at the Washington University School of Medicine, Genome Sequencing Centre (St. Louis, MO, USA). Additional information about these candidate loci was obtained by retrieval of related data in ACEDB (Thierry-Mieg and Durbin, 1996). Cosmid clones initially identified as corresponding to WRN homologous loci (F18C5, K02F3 and 216 E03A3) were obtained from the Sanger Centre, grown, and DNA prepared. Antibiotic selection for the clones was 50 ug/ml Kanamycin for Lorist vector-based cosmids and 120 ug/ml Ampicillin for pJB8 vector-based cosmids. c) F18C5.2 and Associated Mutant Strains One candidate homolog, F18C5.2, was deemed by preliminary analysis of sequence similarity to be the most likely ortholog of WRN in the worm and thus, was initially selected for detailed study in this project. A 6.4 kilobase PstI fragment was successfully isolated containing the complete F18C5.2 ACEDB Genefinder predicted coding region. Two PCR systems were also devised to amplify the 5' and the 3' ends of this locus. Mutant strains obtained for transgenic rescue experiments were previously characterized strains corresponding to loci localized upon the nematode genetic map (ACEDB) in the vicinity of F18C5.2 and accessioned at the Caenorhabditis elegans Genetics Centre (CGC; University of Minnesota) . All the strains in this region of the map were available as mutant chromosomes balanced again the mnCl(II) dominant crossover suppressor. The reference strain for mnCl is SP127, unc-4(el20)/mnCl[dpy-10(el28) unc-52(e444)J d) Transgenic Strains Cosmid transgenic strains using cosmid DNA spanning the nematode candidate WRN homologous locus, F18C5.2, were constructed by Jacquie Schein for this project courtesy of the David Baillie laboratory at Simon Fraser University (Janke et al, 1997). Because of uncertainty about the location of the promoter for F18C5.2, a cosmid clone, C50C12, spanning the C56E6 and F18C5 sequence boundary, was employed for transgenic construction. For this purpose, C50C12 DNA isolated by large scale culture and maxi-prep (CsC12 purified) was sent to the Baillie lab for transgenic construction. Specifically, 20 ng/ul of C50C12 cosmid DNA was co-injected with 80 ng/ul of pCesl943{rol-6(sul006dm)} into BC842, a wild-type N2 male strain. 217 The plasmid insert rol-6(sul006dm) is an EcoRI insert from pRF4. Individual Fl roller worms were picked and worm lines retained which were stably transgenic and carried the cosmid as confirmed by vector PCR. Worms which were isolated and characterized for further analysis were those exhibiting the Rol-6 roller phenotype and testing positive by PCR for the cosmid in question. These transgenic strains were maintained by plating and accessioned as frozen stocks in the laboratory of Dr. Ann Rose. e) Longevity Study Given the apparent Werner syndrome phenotype of premature senescence, one possible consequence of an increased dosage of the gene in the nematode could be a modification in worm longevity. A simple longitudinal study of transgenic worm longevity was therefore undertaken. About 20 gravid roller phenotype and 20 gravid apparent wild-type adult hermaphrodite worms from the transgenic population were plated on separate plates, and incubated for 6 hours at 20°C. The adult worms were then removed from the plates. Eggs were left to hatch and progeny monitored to note first egg lay. The 10 plates of young adults were then re-plated, 10 worms per plate. After two days, these adult worms were sequentially replated each day to fresh plates until no more egg laying occurred. The time course of death of all the post-reproductive worms was then noted. f) Crosses for Phenotype Rescues by Transgenic Arrays Roller transgenic worms were crossed against mnCl balanced candidate mutant strains as follows. First, wild-type male worms were crossed on "mating plates" (plates with only a small limiting dot of bacteria in the centre of the plate) with C50C12 Rol-6 transgenic worms, to 218 generate a stock of transgenic C50C12 Rol-6 males. These transgenic males were then crossed on mating plates with the balanced candidate mutant strains. Fl roller hermaphrodites were then selfed and the F2 screened for potentially rescued mutations (see Figure 42), rejecting recombinants. For lethal ("let") mutant loci, the expected outcome (in the absence of recombination) would be a population of 100% roller unc-4's from F2 roller unc-4 parents. For maternal effect lethals, some non-roller unc-4's would be observed but these would lay dead eggs: only roller unc-4's will lay viable eggs. 219 CM IO i u c O c E Q. T3 (0 O re E .2 o re x: = °-H u c c « re re i CQ in T3 in £ in ~ 9 S re i— i— re O UJ + \ in re E c D) (0 c re cn c c 'io p >» o S ° re « o 2 c c a» i- O) o> to = c o re in a> i. > o cn in .1 !2 o o> a. a, T3 J 4fc 2 io a. re If" 1 = 2 i in x -a u c S" « o o c 3 C V) 4-i 5- = 5 w o a> o •-a> 3 >- o a> «S J= c l l .E o xi o 8 C 7 5 i fj 2 0) _ \> II II w re o a> 3 U in o c re 1-21 in "o o 3 = 0) i_ o — a> re o °» m D£ = 2 ,* « v •£ E > V. •_ CO CD co co O i-i u CD o a CD z cd e H CO -Jj O W5 s a o 60 & .3 0 cj 2 -° 5 13 .23 ^ u B • s | ~ B •§ a ^ -a In J. _ 1 * §-§ s-s o % « *a = S <u B 2 S >- t-l _B cj "3 fl CD CO t-l H o ccj B CD rf l o 00 CN CD 53 1 3 CJ > S3 + S" a <° o m 3 ca , B o ao B "B u -S bO rB •M CN <w CN CO CIh <u „ CO ^ S .2 O rB CJ ^ _ -o <3 II CJ ^ § * b ca O CN CN CN Results Of the research questions to ask in this context, a fundamental one is "what is the phenotype exhibited by C. elegans with a loss of function mutation in "the" WRN orthologous locus?" Given a mutant strain exhibiting such a phenotype, genetic screens would then possible, to elucidate, for example, epistatic relationships between WRN and other genes. In this section, the outcome of a preliminary screen for a WRN locus mutant phenotype in C. elegans is described. 1. WRN Homologous Loci in Caenorhabditis elegans WRN was identified as a putative member of the Escherichia coli "RecQ" DExH DNA helicase family. A BLAST query to the sequence databases of the nematode genome project (both finished and unfinished DNA sequences; Wormpep (Sonnhammer and Durbin; 1997)) undertaken in mid-1996 uncovered three matches to the canonical helicase motifs of WRN, plus a number of more distant matches. A fourth good homolog was subsequently identified as putative ORF in the incompletely sequenced cosmid T04A11 (Table 23 through Table 25; Figure 45 through Figure 48). 221 Table 23. C. elegans Cosmid Loci Exhibiting High Sequence Similarity to WRN Gene Locus Map Location Cosmid Vector Remarks F18C5.2 ll[lin-23..mua-1] Lorist6, Sau3AI partial K02F3.1 lll[] LoristB, Sau3AI partial Sequence may be unfinished; predicted gene truncated? E03A3.2 lll[pat-3..mpk-1] pJB8, Sau3AI partial, Eco K- host T04A11.6 IV[dpy-26..ham-1] Lorist2, Sau3AI partial Originally identified in "unfinished" sequences Table 24. WRN BLAST Search Results Against C. elegans WormPep Database Query = gb | L76937 | HUMDR-Homo sapiens Werner syndrome cDNA (5189 letters) Program: blastx (translating both strands of query sequence in 6 reading frames) Database: C. elegans predicted protein ("WormPep") database (Sanger Centre, March 1997) 7299 sequences; 3,284,741 total letters. Smallest Sum High Probability Sequences producinq Hiqh-scorinq Seqment Pairs Score P(N) N F18C5.2 DNA HELICASE 161 9.6e-69 10 E03A3.2 DNA HELICASE 250 9.0e-65 4 K02F3.1 RECQ SW:P46064 182 1.4e-52 5 R05D11.4 RNA HELICASE 136 7.6e-11 3 F01F1.7 RNA HELICASE 118 4.8e-08 4 C07H6.5 RNA HELICASE 116 5.9e-07 1 F57B9.6 INF-1: EIF-4A SW:P27639 114 1.1e-06 1 T07D4.4B HELICASE 80 3.3e-06 4 T07D4.4A HELICASE 80 1.6e-05 4 Table 25. WRN BLAST Search Results Against C. elegans DNA Sequence Database Query = gb | L76937 | HUMDR-Homo sapiens Werner syndrome cDNA (5189 letters) Program: blastn Database: C. elegans finished and unfinished sequence database (Sanger Centre, March 1997) 4897 sequences; 88,038,132 total letters. Smallest Sum High Probability Sequences producing High-scoring Segment Pairs Score P(N) N F18C5 422 3.1e-24 1 Cosmid=T04A11; Contig ID=01175; Length=27418 279 7.5e-12 2 K02F3 182 5.1e-08 2 E03A3 155 0.11 2 222 Figure 43. MSA Spanning Helicase Domain VI of WRN Homologous Genes RecQ (E.Coli) Sgs1 (S. cerevisiae) WRN (H. sapiens) F18C5.2 (C. elegans) BLM (H. sapiens) T04A11.6 (C. elegans) RecQL (H. sapiens) K02F3.1(C. elegans) PRNIEl i PRTLEC P K D M E ! P N N I E ! i i i i i i * -G5TYQE S| y Y Q E 1 3 R A G R D H £ Y Q E I 3 R A G R D S K S M E N K Y Q E S gi1727557 (H. sapiens) AKSMAGIYYQEJSISRAGRD E03A3.2(C. elegans) SQNLAG IY^CJ JA JSRAGRD PKSVEG5TYQE S 3 R A G R D P K S I E G 5f Y Q E T 3 R A G R D P K S I ENa YQEj S G R A G R D 3 R A G R D 3 R A G R D pLPAEAMLFYDP^D oNYSYCITYFSFFD 3LQSSCHVLWAP2D GSPSICRVFWAPFD i E I S H C L L F Y T Y H D J M P S Y C L M L Y S Y E D ; R A G R D [ D M K A D C I L Y Y G F G ; Q P A T C I L Y Y R L 2 D ; K P S W C R L Y Y N D F ; K R S Y C R I Y Y S K C D 4A ER EN : N / T = 1 E F E F 2V Legend: TToRT 100% conserved residues in all homologous loci Underlined Residues conserved only between loci proposed to be orthologous Italic Residues conserved only in RecQ, WRN and F18C5.2 Figure 44. ClustalW Tree from Preliminary Domain VI MSA 69.2* o f 1 22.7* 49.9* k> 24.9* •o 69.6* I -o RecQ -o 86.4* >^WRN -o F18C5.2 -o Sgsl o EQ3A3 . 2 -o gi 1727557 •o BLM •o T04A11 -o RecQL •o K02F3.1 Bootstrap values shown as percentages; 1000 trials with seed# 111; branch lengths are proportional of inferred evolutionary distances 223 Figure 45. ACEDB Gene Feature Map of F18C5.2 F18C5 (Full cosmid) 7000 17500 .qsooo 8500 9000 19500 110000 10500 111000 11500 112000 112500 Q 6 m SUPERLINK_RW2 [Wn97ab71] F18C5 F18C5 U U F18C5.3 WP:CE02652 yk29d2.3 yk7b7.5 yk7b7.3 yk29d2.5 yk2al2.3 yk2al2.5 yk32cll.3 F18C5.2 yk48dl2.3 yk7h8.3 yk41c3.5 yk7h8.5 yk48hl0.5 WP:CE02651 WP:CE02651 yk48hl0.3 yk41c3.3 • B • 1 Q 0 I DD o 0 ID 113000 The A C E D B "fmap" display of the cosmid F18C5, zooming in to F18C5.2, the RecQ family homologous locus. The graphic shows the predicted exon/intron structure based both upon A C E D B Genefinder and biological analysis of the sequence (helicase motif conservation). Multiple boxes to the right of the gene represent B L A S T similarity matches, EST alignments and Caenorhabditis briggsae aligned sequences. ESTs and WormPep matches are listed to the right of the locus label. Similar notes apply to the next three figures. 224 ;ure 46. ACEDB Gene Feature Map of T04A11.6 T04A11.6 yk287e5.3 [970723 dl] Prosite and Pfam signature for helicases (PS00690 n& PF00271) yk205e3.5 yk361f6.3 ykl5h9.5 WP:CE13130 ykl5h9.3 yk287e5.5 yk361f6.5 CEMSC83FB yk398al.3 yk398al.5 helicase yk205e3.3 J17000 o B 225 ure 47. ACEDB Gene Feature Map of K02F3.1 SUPERLINK_RW3 K02F3 [wbgll.3p30] K02F3 K02F3 31500 3 2000 32500 3 3000 3 3 500 34000 34500 3 5000 3 5 500 3 6000 D C il K02F3.1 RECQ Y6D11A 226 Figure 48. ACEDB Gene Feature Map of E03A3.2 E03A3 -T02C12 10500 11000 11500 12000 12500 13000 13500 14000 14500 15000 15 500 WP:CE00941 941104: An exon (1575 8..15854) was removed and the amino terminus of E03A3.2 DNA helicase 3C 227 Preliminary multiple alignments (using a ClustalW analysis site on the WWW and manual alignment of residues using BLAST output results) suggested that a one-to-one ortholog mapping might be possible between the nematode helicases and WRN Or the other known human paralogs to WRN (Figure 43). a) T04A11.6 In 1996, I identified a nematode helicase locus, subsequently denoted as T04A11.6, among the unfinished pre-annotated sequences at the Sanger Centre. Serendipitously, this locus was brought to the attention of Dr. Chantal Wicky, a postdoctoral researcher in the Rose lab at UBC, who was attempting a phenotype rescue of Him-6 with cosmid T04A11. Subsequent experimentation by Dr. Wicky confirmed that Him-6 is a mutation in T04A11.6. The molecular phylogeny analysis of this thesis project (see below) supports the hypothesis that T04A11.6 is the nematode ortholog to the Bloom's syndrome locus, BLM. b) K02F3.1 Preliminary inspection of this locus (Figure 47) indicated that the most highly conserved domain VI motif of the core helicase module is missing from the ACEDB Genefmder predicted exon publicly accessioned in the nematode and external sequence databases. Closer examination of the sequence revealed a weak "CG" donor based splice site that, if active, would incorporate an additional exon containing this highly conserved motif. In addition, there is some evidence that the 5' end of this published sequence may not form part of this predicted helicase. Sequence data for an EST, ykl21g5.3, suggests that the first four exons of K02F3.1 may be a separate gene with a termination codon at 33724 (John Spieth, WUSTL/GSC, personal communication). This observation is consistent with Figure 36, a multiple sequence alignment between K02F3.1 and its corresponding putative human ortholog, RECQL, that reveals a 402 residue N-terminus protein leader in nematode not found conserved in the human gene. 228 c) EOS A3.2 Comparative sequence analysis of helicase domain VI (Figure 43) suggests that E03A3.2 is divergent from other nematode and human WRN homologous loci. When initially examined in 1996, this locus did not have any recorded nematode ESTs, nor did the preliminary BLAST searches suggest the existence of a candidate human ortholog. A more recent release of the nematode data reports a match to a worm EST, ykl46e7, and BLAST searches of the human dbEST database pulls out a suggestive match to a human EST (Entrez id gil727557) which could represent an as yet uncharacterized human helicase gene orthologous to E03A3.2. d) F18C5.2 F18C5.2 was the C. elegans cosmid locus originally reported as homologous to the Werner syndrome gene in the paper announcing the positional cloning of WRN (Yu et al, 1996). As already noted in that paper, close comparison of the ACEDB Genefinder predicted F18C5.2 gene against the WRN protein sequence suggests that one coding region (spanning F18C5 nucleotides 10240 through 10427) is a highly conserved domain in other helicases which was missed by the Genefinder algorithm. This missing exon was included in all analyses for this thesis project and is accounted for in Figure 45. 2. A Cosmid System Containing a WRN Homologous Locus The three fully sequenced cosmid clones (F18C5, K02F3 and E03A3) were obtained from the Sanger Centre and an attempt made to grow the clones for characterization. Technical difficulties were experienced in growing the clones but DNA from one of the clones, F18C5, was eventually successful obtained (Figure 49). Since preliminary MSA analysis (Figure 43) suggested that F18C5.2 might be the WRN ortholog in the worm, research efforts were focused upon this cosmid. 229 To provide a quick means of detecting the F18C5.2 locus in cloned DNA, PCR primer systems were devised which specifically amplify the 5' and 3' end of the locus. The 508 bp sized 5' end specific system consists of the forward primer 5' -GACTATTTCTC ATTTCCTCCC ACG-3' and the reverse primer 5'-CTGTGTCTCATCACAAGCTGGG-3'. The 517 bp sized 3' end specific system consisted of the forward primer 5' -CTGCTCGAGGAATTAGTGAGGG-3' and the reverse primer 5'-CAGCATTCATATAGACAGGGATGAG-3'. Both systems amplified in 3.0 mM MgCl2 using a two temperature PCR reaction series of 35 cycles of 94°C for 30 seconds followed by 70°C for 2 minutes, followed by a final 10-minute extension at 72°C. Examination of the published F18C5 sequence identified a 6.4 kb PstI fragment spanning the F18C5.2 locus, verified by restriction digestion of cosmid DNA with this enzyme (Figure 49). This fragment was subsequently subcloned into a Bluescript plasmid by ligation/transformation from the cosmid DNA prepared and verified by PCR (Figure 50). 3. Is FI8C5.2 a Trans-spliced Operon? One of the unusual aspects of the nematode genome not found in most other metazoans is the phenomena of trans-splicing and polycistronic transcription units ("operons"). Trans-splicing is defined as the ligation of exon sequences originating in independently transcribed RNAs. In C. elegans, one observes specific trans-splicing of either one or the other of one of two "Splice Leader" sequences, SL1 or SL2, onto the 5' end of processed transcripts. This process in the worm is seen most often in the context of polycistronic operons, that is, sets of sequence contiguous genes separated by a very small distance from each other (typically, ±100 bp in C. elegans) co-transcribed from a single promoter (Spieth et al, 1993). The SL1 sequences typically appear on transcripts from the most 5' locus of an operon, while the transcripts of 3' downstream "internal" loci typically /raw$-splice the SL2 sequence. 230 Figure 49. PstI Restriction Gel of F18C5 Cosmid DNA <r 6.4 kb 0.8% ethidium agarose gel of a PstI restriction digest of F18C5 cosmid D N A mini-prep; 6.4 kb fragment containing F18C5.2 locus is indicated. Figure 50. 5' and 3' Locus Specific PCR of Fl 8C5 6.4 kb PstI Subclone it T3 <r 517 bp 5' and 3' F18C5.2 locus-specific PCR systems used to verify presence of locus in D N A mini-preps of the 6.4 kb PstI subclone of the F18C5 cosmid. 231 Examination of the F18C5.2 sequence context reveals that this locus lies only 130 bp downstream of a cDNA matching the 3' end of the predicted upstream gene, F18C5.3. This context strongly suggests the working hypothesis that this locus lies within an operon, downstream of a promoter in the 5' flanking region of F18C5.3. Indeed, a preliminary computational search for such a (eukaryote) promoter, using the TESS WWW site (Schug and Overton, 1997), reveals a strong candidate eukaryote-like promoter in this 5' flanking region (Figure 51). Experiments in rapid amplification of cDNA ends (RACE) designed to elucidate whether or not F18C5.2 transcripts are SL2 spliced; and testing of the candidate promoter by molecular biological means, were contemplated but not undertaken in this thesis project. However, if such a promoter existed, it would lie on an adjacent cosmid to F18C5, hence, the F18C5.2 locus might not be transcribed in the context of the cosmid F18C5 when used to construct a transgenic worm. Therefore, on the basis of the working hypothesis, it was decided to obtain and prepare DNA from another nematode cosmid clone, C50C12, thought to span the putative operon, including the putative promoter. This alternate clone was then used as the source of DNA for the construction of a transgenic worm containing F18C5.2 (see below). 232 iri U oo 00 fa fl o '5b •Pi in VO W VO IT) u a CO O u cu o a o fa -a T3 C/3 W H (ti e o -U (fl <l> bo in I ft ft, H (fl H 4J fx, <D H o o ro in ro (fl ft (fl J J 0) XI Pi O u> » H (fl & J H W ft - rH (fl H 5 PS W O fa" — PQ o C J • U CN o LO ro LO ro o o in ro H U ^ — (ti  0 O I U CD (0 fa CO l o H tn H u CN 1 <: (ti H I fX C J J J H 0 — 1 <J EH (ti 1 <D I t f£ ft J J CL, U 1 i CD fa A. CD (ti PQ ft 1 i H Exon' EH (ti E-i (N Pi  CJ Exon' < tn (fl H CD 1 EH W Exon' CD ft a J J fa Exon' o PQ fa cu fa EH , s I EH Exon' (ti H - X ! Pi W o I EH u u - H EH 1 O iJ w CD (8 H rl (ti CN U rH < 4J ft Q fa O H CD ro CJ (ti W ft H • -—" CD CD (ti < ,—* rH H CN I EH LO r \ CJ — o (fl 1 fa H  < CD (ti 1 o • EH v— 1 CJ 00 EH (ti fa • < 1 1 EH (N H H (ti fa t> • H H — H J J 1 1 EH (ti w — ' 1 fa <U 1 1 U h-H (ti l i rP 1 EH (^  (ti l l i 1 EH CD (ti O l i fa 1 CD > CD 4J 1 i H 1 CJ CJ (0 CN l i H 1 0) — + ft J J H l i fa (fl • CD (ti —' 1 EH CJ i EH J J CN 1 1 (ti" J J V r» ft (ti H I .d J J 0) ft J J W 1 ft J J J J CJ (ti H 1 rH J J •rl EH J J CO 1 (ti J J W < (ti 1 1 J J CD J J , - 1 fa u ft (ti o 1 H fti 0 CD U • 1  tn J J CJ J J CN I fa J J ft CD J J H 1 EH (fl ce EH (ti 1 PQ (fl o EH O 1 H IC < < (fl 1 H J J CD (fl 1 fa J J <u (fl 1 EH J J u tn 1 tn •H f2 (fl 1 PQ (8 rH (C 1 H J J ft oo H o C/l C J (fl fa EH (fl ro EH H < H H IT) H o H LO CN  ro i—l ro LO fa LO W LO ro EH ro ro T) -rl 6 to o • u ; LO s_ CJ £ 00 fa H 1 s 5! « a fa S3 H EH W Pi < H U o E S m | Q M W u cj .S ^ o w e U 4§ o 3^ <L) > o cs -a" s u S oo -S S ° " a. c ^ >3 O I-. ^ ^ «2 § 3 s CO o T3 | 2 « g 8 C h w O c 52 • s C ° O. © & cn 2 T J 1 0 J 3 cU « o g o U —1 C a c S .2 "g S S 3 CA -2P 00 E -fi o i-. CO co CN (L> op fa 4. Mutations Mapped in the Vicinity of F18C5.2 ACEDB can efficiently correlate information from the genetic, physical and sequence maps of a genome. Using such information, one is able to ascertain the physical contig, hence genetic map interval, likely to contain a mutation of a specified locus. Therefore, the initial experiment of screening known mutations on the map can be contemplated. This type of analysis using ACEDB was done for F18C5.2 (Figure 52 and Figure 53; Table 26). Table 26. Caenorhabditis elegans mutant loci on genetic map spanning F18C5.2 Locus Allele Strain Phenotype let-22 mn22 SP377 early larval lethal let-264 mn227 SP710 embryonic lethal let-265 mn188 DR1218 mid-larval lethal mel-1 it19 KK357 maternal effect lethal mel-3 b281 KK40 maternal effect lethal The A C E D B nematode database was queried to ascertain the nearest genetic loci (lin-23 and mua-1) positively assigned to candidate cosmids flanking F18C5 on the A C E D B physical map (Figure 52). The A C E D B genetic map (Figure 53) interval delimited by these genetic loci was inspected for mutant loci. Loci known in 1996 and subjected to a phenotype rescue screen in this thesis project are summarized in this table. 5. Construction of Transgenic Worms Carrying F18C5.2 Transgenic worms were constructed using C50C12 cosmid DNA. Four transgenic strains designated BC5577 to BC5580, were received from the laboratory of Dr. David Baillie. Subsequent to receipt of the strains, single roller hermaphrodite worms of the transgenic strains were replated every 1-3 days and characterized for general phenotype, fecundity, and Rol-6 marker penetrance. The summary of data is provided in Table 27. The transgenic strain BC5580 generated large numbers of progeny and exhibited almost 50% transgenic array transmission. BC5580 was therefore selected as the principal strain for further analysis. 234 1/1 U oo CO o "oo i? AH ffl Q W o < (N i n 60 io un u-i D n m CT. en «) ftJS ¥ fM fM 7 ? 3 _\C> CH X I T - H H H S £ £ * i <rf>>E i F -J .£-1 u OB u X J so .a CJ fl .SPT3 I "a o o o cj "O •3 S CM 3 O CJ cs d S 3 H o "c3 u • >^  3 XJ £ •a CC c Q w U ] o n U < .S C M o x i o ca >> X ! a. 0 n i) cj X -2 y §• £ o • i—I CO co a CO Ml H CD CO e o CD P i o 3 g K3 -M* '5b rt o I—) r--CN CD 2 o o r i i i i - ° K O K P K O K O 5^  D V icT* lev* 6s 1 1 1 1 1 1 1 1 N *P K P k P K P K P Lp ^ 0s 0s 0s 0s IT* r> jro jco N - Jco to r> kM jr^  I T - !CM h-i i i i i i i i i i X) l l l l sP Kp K P K P K 9 5^  0s" 15s" nv* D"" I I I I I I I I 5^ N 1 1 1 1 1 . sO K P K P K P K P K 9 J " D N O*- D N 0s* D N Z> JCM "OO jCO D5 ^ kr Sb w kr w-I I 1 I I X) o 1 1 1 1 1 1 1 1 j> ico K Ico k r> jco ^ }n f^ r ! ! ! ! • i i i N 30 N • i * i i i i i i i i r> loo ico Ico Ico K x> j ^ jco jcM jco 1 1 1 1 1 1 1 1 1 1 N I I I I I I I I D I T - 1^" lOO IT-r- JLO }n j^ r ^ I I I I I I I I • i i i O N I I I I ^ b) N - p JLO 2 W- Ico ICM I T - I T - IT— h— l l l l l l l l s •cf D c= CO cn o a. i i i i i i i i i i i i o I T - loo I T - in ^ f^ r jr- }r- p N ICN iCO Icsl r-1 1 1 1 1 1 1 1 1 1 1 1 r-M r— 1 1 1 1 1 1 1 1 1 1 o I D ICO loo Ico bo 5 !P, & ^ & & N C M Csl rr- JCN C O i i i i i i i i i i i i i i i O X) r-I I I I I I I I N ico ILO JCM kf r- [00 -CO £ T p T - f T ~ | T - C M I T -l l l l l l l l l l l l r> X) X) 1.1 1 1 l l l l LO V S * r*4" lo> C O I I I I I I I I I I I I ? 1 hird Plating • o X. i i i r •• i i i i i i i i ro N - «o> tp N w Ico t\i i i i i i i i i i i i i 1 1 1 1 1 i i i i i n \p p w- ico p a ICN ICM Ico t\i i i i i i i i i i i i i i i i r— 1 1 1 I ' i i i i -I I I I I I I I N O •1 1 1 1 1 i i i i i o ipo b> jco ICM p> n JLO ILO ico Ico Eo i i i i i i i i i i i i i i i o 1 hird Plating TO O i i i i i i i i N W SCO "CD CNI JO W S W CM r r r i i i i i i i i i o i. i i i i 3 \r- p> \j) fr- tj> D ICN S O I O JLO to 1 i i r r r i i i i i i i i i i 35 N D I I I I <t b is U P 5 £ £ !o> £ i i i r I I I I I I I I 3) i i i i i N SO JCD K O JO p> o ICN ICN I T - ICM C M r- Jr- | T - | T - [ T - |r-l l l l l l l l l l s 1 hird Plating I J i i i i i i i i ^ JCD |CJ) JCD f-O 1 1 1 1 1 1 1 1 1 1 1 1 iiN lo Ico kr Io x> I^J- }CN ^ £1 i i i i i i i i i i • i i i i 3) N I I I I I I I I ^ iCM rr- JCO bo r> jco ^ l l l l l l l l l l l l X) o N I l l l l I I i i i ^ S I T - 1^ ICM \n * !°. jco jco po i i i i i i i i i i • i i i i X) o 1 hird Plating CO O ) 05 J J ! ! I ! <fr J O JCM )CD fr> X> C O C O C O C D 1 1 1 1 i i i i i i i i i i i i i i i x> jco jco In jr- K ^ p^ - jCO jCO jCO C M 1 1 1 1 1 1 1 1 1 1 N O O 3 Kn *^ 1°° N ° i f° r3" f° l l l l l l l l O O N i i i i i • i i i i i i i i i X) }0 |o> Jco 1^ -n C M jco pst i^ r i^ r i i i i i i i i i i £> N N Second Plating ID l o 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i i i i i i i i i i i t i i 5 l l l l l l l l r> i» ico k U I T - r*' I T - i I I I I I I I I I I I I N i i i i i • i i i i ° Is is b b & r (OO N - iCO 1^ I i i i i I I I 1 I I I I I I r) N Second Plating CO 4—* o 1 1 1 1 1 1 1 1 r— O) O) C O CT) n^MTj-^ o i i i i i i i i i i i i O N I I I I I I I I I I I I r> co JCN JLO p> £> *n N ,kr | L D I I I I I I I I I I I I i— O O l l l l l l l l O £1 N l l l l I I I I r> 1°. |co |a> jbo I I I I I I I I I I I I B t-r-d" Second Plating _l i i i i i i i i * ico £ b b I i i i 1 I 1 I I I l I i i i i N I I I I I I I I = |CM jo b 1° I I I I I I I I i • • i a l l l l l l l l 1_ 1 l_ , 1 3 ICM | ^ 1-I I I I l l l l l l l l 1 1 t 1 r> N I I I I I I I I icN ICN ICM ICM {CM jr- JCM jr-I I I I I I I I • i i i O 35 Second Plating CO O) cn U 1 1 1 1 I I I I l l l l ^ |co 1^ - }oo to IO jCD CO I I I I I I I I r> N 1 1 1 1 1 ! I I ! I ro jr- k\i ^  jo) l^r n tn co n^- co • i i i i i i i i i r> 7> N [ i l l Jill N jf^  jco ^ fS I I I I I I I I X) O N i i i i i i l l ! ! n |CD jco |co in to N C O I O jCO oo i i i i i i i i i i X5 r> r> hirst Plating X) i o I I I I I I I I x) p k U-1 I I I I I I I I I I I I o d-i i i i i i i i i i » hr U jco t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 o I I I I I I I I I I I I I I I I D ?0 i i i i i » b b h b b i i i i i i i i i i .^ o hirst Plating o I I I I I I I I r> co co hr rO JCO JCD fr- |Sj I I I I I I I I I I I I r> l l l l l l l l -^ hr- I O r^ - !(0 N |T0 hcf j? F l l l l l l l l l l l l § I I I I I I I I ^ f° fcn P5 ^ }co «> JCD | r -I I I I I I I I I I I I N O I I I I I I I I ^ O) |p C O JCO r- jCO |CM |CO |CO I I I I I I I I I I I I s o ct O hirst Plating c— _l I I I I I I I I r> b £ b b l i I | l l l l l l l l I I I I o N l l l l l l l l l l l l => jco jv- p l l l l l l l l O <t I I I I I I I I =. jj: b p. b l l l l l l l l I I I I n I I I I I I I I ICM !LO ico jCO |^ JCM [CO l l l l l l l l LO ro X) N hirst Plating to cn cn J J l l l l III! 2 £2 !E2 >^  ^ ?0 [CO |TJ - [r- CN i i I I I I I I N o i i i i I I I I ^ b U b b N [CM [^  jr— jr— ! I I i X3 D I I I I I I I I ^ jCO L jCO b> r— r^^ nd" K— I I I I I I I I 3) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TJ- ^ JO | 0O JO J N r- (LO jCN JCO JLO tO i l l 1 1 N Plate I I I I " I I I I < im iu io b • i i i • i i i • i i i CO o ,—[ j 1 , , i i i i i < im io h b b • i i i i • i i i i CO o 1 1 1 1 I I I I < jm io p b • i i i • i i i • i i i co o I ! ! ! ! < icn io io \u p. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CO .—' o Strain n a J n » r> J to 3) r> n J to X) J to 60 .3 to "a. ~£ S o o S ft "e3 • M o H •o CD C ^ a 3 O o o c o 1 o ft CM o oo e 1 "5. c cu c^  <u T3 ca > § 60 60 w co Appendices 6. Longevity of Transgenic Worms? The longitudinal longevity study of the BC5580 transgenic worm strain gave the following results. Both sets of 20 transgenic and "wild-type" adult worms apparently lay the same number of progeny (approx. 400). Taking the plating of the initial transgenic and "wild-type" sets of 20 parental worms as "Day 0", 10 plates of 10 young adult worms for each of transgenic and "wild-type" populations were replated on Day 3 during the initial egg laying period. These adult worms were replated each subsequent day until egg laying stopped, by Day 7/8. Some bacterial contamination of plates was thought to exist. The remaining time course of the populations is noted in Table 28. Final day of observations was Day 22, at which time all the worms were barely moving; however, at this point, all the plates appeared to have non-native bacterial contamination which may have contributed to all the worms' demise. A comparable average of 5.625±2.134 transgenic worms and 5.00±1.85 "wild-type" worms survived to Day 20. All the worms were dying at Day 22. It appears unlikely that the F18C5 array positively or negatively influences lifespan. However, a caveat to this result is that bacterial contamination of the plates was observed that was likely detrimental to all the worms, transgenic or wild-type. This contamination may have obscured any small lifespan modification effect. 7. Phenotype Rescue by the F18C5 Transgenic Array? A phenotype rescue experiment of F18C5.2 proximal mutant loci (Table 26) was attempted using the BC5580 transgenic strain, following the protocol specified in the Materials and Methods section of this thesis. All mutant strains were successfully crossed with BC5580; however, no apparent rescue of the mutant phenotypes by the transgenic array was observed. 238 Appendices Table 28. Longevity Experiment Time Course Plate Day 3 Day 8 Day 10 Day 13 Day 20 Transgenic 1 10 9 9 8 4 2 10 8 8 8 [1] 3 10 9 9 8 7 4 10 10 10 10 5 5 • 10 9 8 6 5 6 10 10 10 9 9 7 10 9 9 10 [1] 8 10 8 8 6 3 9 10 10 9 9 4 10 10 9 9 8 8 "Wild-type" 7 i i 1 10 6 5 5 2 10 8 7 6 3 3 10 7 7 6 3 4 10 9 9 8 5 5 10 7 7 7 6 6 10 4 4 4 2 7 10 6 6 6 5 8 10 7 7 5 5 9 10 9 9 9 7 10 10 8 8 8 4 For each of transgenic and wild-type control worms, 10 populations of 10 worms apiece were monitored for survival as documented in thesis methods and results. Day 0 is date of synchronized egg lay of parents generating the test populations. [1] Lots of young worms; had contaminating young parent in population? Plate discarded from study Discussion Given such a broad range of helicase functionality, elucidation of the true biological function of WRN would require additional functional analysis in suitable experimental systems. Yeast and mouse homologs are known (Gangloff et al. 1994; Watt et al. 1995; Lu et al, 1997; Stewart et al, 1997; Imamura et al, 1997), but for reasons previously elaborated earlier in this thesis, the model organism, Caenorhabditis elegans was chosen as the experimental context within which to undertake such functional analysis. Specifically, a preliminary multiple sequence alignment (MSA) comparison and construction of a molecular phylogeny of the RecQ-like DNA helicase paralogs in the worm, based upon the highly conserved helicase domain VI, established the hypothesis that F18C5.2 is 239 Appendices the best sequence homolog, hence candidate ortholog, to WRN (Figure 43 and Figure 44). A transgenic worm carrying the F18C5 cosmid was successfully constructed and an attempt was made to rescue the phenotype of five known mutant loci in the region of the genetic map spanning F18C5.2. The outcome of this screen was negative. This is not entirely unexpected, since computational and EST analyses of the completed genome sequence in this region reveal a large number of gene loci, far outnumbering the small set of known mutations on the map. Thus, the probability of a transgenic rescue "hit" is somewhat limited, even if the cosmid array were proven to efficiently transcribe its genes, a fact not rigorously verified experimentally during this thesis project. Furthermore, there was no significant evidence that the F18C5 array in strain BC5580 influences the longevity of transgenic worms relative to wild-type animals. Preliminary analysis of the F18C5.2 locus also suggested that this locus could form part of a C. elegans operon, with a candidate promoter lying at the 5' end of F18C5.3. It is interesting to note that the upstream gene, F18C5.3, at the time of this writing, shows significant ij/-BLAST-level similarity to only a small set of proteins. These are predicted gene products in S. cerevisiae, S. pombe and A. thaliana; and a single human protein called "DRIM" (Entrez id# gi3242214), observed to be down regulated in metastatic cells. It is still undecided in the scientific literature whether or not gene loci co-transcribed in a nematode operon necessarily bear any functional relationship to one another. It would, however, be interesting to examine any possible association of DRIM with WRN in human cells, given the WRN phenotypic feature of cancer susceptibility. With the failure to associate an existing mutant locus within the candidate genetic interval using a phenotype rescue screen with the F18C5.2-spanning transgenic array, alternative strategies were tentatively considered. These included "antisense" RNA expression of F18C5.2 in wild-type worms (Fire et al. ,1991) and PCR screening of a worm "deletion library" (Moulder 240 Appendices and Barstead, 1997). In October 1997, building upon the growing resources of sequence databases and early results in the computational analysis of conserved WRN sequence motifs, I decided instead to pursue a bioinformatics-based strategy of comparative molecular phylogenetic analysis to elucidate WRN structure/function relationships as the primary thesis line of research. 241 E. Computer Program Listings (Perl scripts et al.) Appendices ################################## Perl 5.004 ########################################## # # BlastLis t .pl : This script extracts the Entrez "Gi" sequence accession numbers # from a NCBI PSI-Blast HTML output f i l e # # For PSI-Blast HTML output version running as of August 6, 1998 # ######################################################################################### # undef $/ ; $fstr = <> ; while($fstr =- /<a\s+href\s*=/) { $fstr = $1 ; # move the match pointer forward f i r s t . . . $fstr =~ m% ( P>] *)\> (<INPUT\sTYPE [^ >] *\>) ? ( [" <] *) \<\s*/a\s*\>% ; $href_head = $1 ; $inputype = $2 ; # i f not null , then this was marked as a significant hit? next i f ($href_label =- /Reference/) ; # skip the first reference href... i f ( $inputype && $href_head =~ m#Entrez/query#) { $href_head =- /uid=(\d+)/ ; print "$l\n" } > ################################## Perl 5.004 ########################################## # This routine reads in a FASTA formatted f i le of "Entrez" from stdin # records with 11 >gi | dddddddd | etc . . . " header lines # and replaces the header line with ">gidddddddd" only # while outputting the ful l header lines to another catalog f i l e # # Version: 6/8/98 Creation # ########################### by R.M. Bruskiewich ################################## ($catalog = shift(@ARGV)) | | die "Usage: CATALOG_FILE <INPUT_LIBRARY >OUTPUT_LIBRARY\n" ; open(CATALOG,">$catalog") ; while(<>) { i f (/A\>\s*gi\s*\|\sM\d+)/) { print ">gi$l\n" ; print CATALOG ; } else { print ; } ##################################### Perl 5.004 ############################### # Converts genbank protein data records into .ace f i le data records # # Input: An genbank flat f i l e # Output: A .ace f i l e of ACEDB data records with suitable tag-value pairs # following the WRNdb modification of the C. elegans models.wrm # # This function ignores the sequence data for now... # # Version: 22/09/98 Creation # ######################## by R.M. Bruskiewich ################################ package main ; $NONE=0; $MINIMUM=1; $MODERATE=2 ; $VERBOSE=3; 242 Appendices $DEBUGLEVEL=$VERBOSE; # g e t t h e d e f a u l t p a p e r p r e f i x f r o m t h e command l ine ( $ P a p e r C l a s s : : p p f i x = s h i f t SARGV) | | d i e "@%?!!ERROR - Need a d e f a u l t p a p e r p r e f i x ! \ n \ n U s a g e : g b 2 a c e . p l DEFAULT_PAPER_PREFIX\n" ; o p e n ( S T D E R R , " > > g b 2 a c e . e r r " ) ; ($DEBUGLEVEL >= $MINIMUM) && ( p r i n t STDERR " E x e c u t i n g f a s t a 2 a c e . p i ! \ n " ) ,-#### some g l o b a l d a t a ###### %month = ( " J A N " => " 1 " , "FEB" => " 2 " , "MAR" => "3", "APR" => " 4 " , "MAY" => " 5 " , "JUN" => " 6 " , " J U L " => " 7 " , "AUG" => " 8 " , "SEP" => " 9 " , "OCT" => " 1 0 " , "NOV" => " 1 1 " , "DEC" => " 1 2 " ) ; $ P a p e r s = 0 ; $Locus = " " ; p a c k a g e A u t h o r C l a s s ; sub new { my $ A u t h o r s = [ ] ; b l e s s $ A u t h o r s ; r e t u r n $ A u t h o r s ; } sub Add { my ( $ a R e f , $ d a t a ) = @_ ; ®$aRef = (0$aRef, $ d a t a ) ,-} sub p a r s e A u t h o r s { my ( $ A u t h o r s ) = @_ ; my $ A n A u t h o r ,-w h i l e ( $ _ ($_ ! - / A \ w / ) && ($_ ! - / A \ s { 2 } \ w / ) ) { t r / \ - \ . / / d ; w h i l e ( / \ s * ( \ w + ) \ , ( \ w + ) \ , \ s / | | / \ s * ( \ w + ) \ , ( \ w + ) \ , $ / | | A s * ( \ w + ) \ , ( \ w + ) \ s a n d / | | / \ s * ( \ w + ) \ , ( \ w + ) $ / ) { $_ = $ ' ; $ A n A u t h o r = $ 1 . " " . $ 2 ; $ A u t h o r s - > A d d ( $ A n A u t h o r ) ; } $_ = <ARGV> ; } } p a c k a g e P a p e r C l a s s ; $ppseq = 0 ,• $ P a p e r L i s t = { } ; b l e s s $ P a p e r L i s t ,-sub u p d a t e { my ( $ h r e f , @ d a t a ) = @_ ; %$hre f = ( % $ h r e f , © d a t a ) ; } sub c o l l e c t ! my ( $ P a p e r s , $ d a t a ) = @_ ; 243 @$Papers = ( @ $ P a p e r s , $ d a t a ) } sub name { my ( $ a P a p e r , $ L o c u s ) = @_ ; my $ i d e n t i t y ; i f ( $ a P a p e r - > { " M e d l i n e _ a c c " } ne "") { # Use m e d l i n e i d d i r e c t l y r e t u r n ( "med" . ($aPaper ->{ " M e d l i n e _ a c c " }) ) ,-# . . . o t h e r w i s e , m a t c h on i n f e r r e d t i t l e } e l s i f ( $ a P a p e r - > { " T i t l e " } ne "") { $ i d e n t i t y = $ a P a p e r - > { " T i t l e " } ; } e l s i f ( $ a P a p e r - > { " J o u r n a l " } ne "") { $ i d e n t i t y = $aPape r ->{ " J o u r n a l " } ,-} e l s e { $ i d e n t i t y = " g i $ L o c u s C i t a t i o n " ; } # f a i l s s i l e n t l y i f no $ i d e n t i t y ! $ P a p e r L i s t - > { $ i d e n t i t y } ; } sub r e c o r d P a p e r { my ( $ a P a p e r , $ L o c u s ) = @_ ; my $ t a g , $ i d e n t i t y , $ l a b e l ; # I d e n t i t y o f ? P a p e r o b j e c t : has i t b e e n s e e n b e f o r e ? # B u t , i g n o r e M e d l i n e i d s (you w i l l u se them d i r e c t l y ? ) i f ( $ a P a p e r - > { " M e d l i n e _ a c c " } e q "") { i f ( $ a P a p e r - > { " T i t l e " } ne "") { $ i d e n t i t y = $ a P a p e r - > { " T i t l e " } ,-} e l s i f ( $ a P a p e r - > { " J o u r n a l " } ne "") { $ i d e n t i t y = $ a P a p e r - > { 1 1 J o u r n a l " } ,-} e l s e { $ i d e n t i t y = " g i $ L o c u s C i t a t i o n " ; } # . . . i f n o t , c r e a t e a new ( i n t e r n a l ) s e q u e n c e i d i f ( $ P a p e r L i s t - > { $ i d e n t i t y } e q "") { $ppseq++ ; $ P a p e r L i s t - > u p d a t e ( $ i d e n t i t y , " $ p p f i x $ p p s e q " ) ,-} } } sub p a r s e P a p e r s { my ( $ C l a s s , $ L o c u s ) = @_ ; my $RefNo = 0 , $ K e y w o r d ; my $ P a p e r s = [] , $ a P a p e r = { } , $ A u t h o r s = [] ; b l e s s $ P a p e r s , $ C l a s s ; b l e s s $ a P a p e r , $ C l a s s ; w h i l e ( < > ) { # D i d y o u f i n d a n o t h e r p r i m a r y k e y w o r d ? . . . i f ( / A ( \ w + ) \ s + / ) { $ K e y w o r d = $1 ; # t h e n r e c o r d t h e c u r r e n t p a p e r . . . $ a P a p e r - > r e c o r d P a p e r ( $ L o c u s ) ; # . . . and a d d i t t o t h e p a p e r l i s t p a p e r r e c o r d $ P a p e r s - > c o l l e c t ( $ a P a p e r ) ; r e t u r n $ P a p e r s i f ( $ K e y w o r d ne "REFERENCE") ,-# . . . e l s e s t a r t a n o t h e r R e f e r e n c e . $ a P a p e r = {} ; b l e s s $ a P a p e r ; n e x t ; } i f ( / A \ s { 2 } A U T H 0 R S \ s / ) { $ = " $ 1 " ; 244 Appendices $Authors = new AuthorClass ; $Authors->parseAuthors ; $aPaper->{ "Authors"} = $Authors ,-redo ; } e l s i f (/ A\s{2}TITLE\s+/ && $' !~ /Direct\sSubmission/) { $ T i t l e = $' ,- chop($Title) ; while(<>) { l a s t i f ( /A\w/ || /A\s{2}\w/) ; chop ; / A \ s * ( \ S ) / ; $ T i t l e .= "\\\n $1$' " ; } $aPaper->{"Title"} = $ T i t l e ; redo ,• } e l s i f (/A\s{2)JOURNAL\s+/) { $JEntry = $' ; chop($JEntry) ; i f ( $ J E n t r y =- /Submitted\s+/) { $JEntry = $ 1 ; i f ($JEntry =- /(\(\s*\d{1,2}-\w{3}-(\d{4})\s*\))/) { $aPaper->{"Year"} = $2 } $JEntry = "Direct submission $JEntry" ; while(<>) { l a s t i f ( /A\w/ || /A\s{2}\w/) ; chop ; / A \ s * ( \ S ) / $JEntry .= "\\\n $1$'" ,-} $aPaper->{"Journal"} = $JEntry ; redo ; } e l s i f ($JEntry =~ /Unpublished(\w|$)/) { $JEntry = $ 1 ; $aPaper->{ "Journal"} = "Unpublished Genbank entry $Locus" ,-i f ($JEntry =- /\(\s*(\d{4})\s*\)/) { $aPaper->{"Year"} = $1 } } el s e { # a well formed journal entry... i f ( $ J E n t r y =~ /\s*(\D+)/) { $JEntry = $' ; $1 =- /(.+)\s+$/ ; $aPaper->{"Journal"} = $1 ; i f ( $ J E n t r y =-/\s*(\d+)(\,|\s+(\((\d+)\))?\,)/) { i f ( $ 4 ne "") { $aPaper->{"Volume"} = "$1\" \"$4" ; } else { ' $aPaper->{"Volume"} = ; } $JEntry = $' ; } i f ( $ J E n t r y =- /\s*(\d+)(\-(\d+))?\s/) { i f ( $ 3 ne "") { $aPaper->{ "Page"} = "$1\" \"$3" ,-} el s e { $aPaper->{ "Page"} = 11 $1" ; } $JEntry = $ 1 ; } i f ( $ J E n t r y =- /\s*\((\d+)\)/ ) { $aPaper->{"Year"} = $1 ; } } else { $aPaper->{"Journal"} = $JEntry ; } } } e l s i f (/A\s{2}MEDLINE\s+(\d+)/) { $aPaper->{"Medline_acc"} = $1 ; } } r e t u r n $Papers ; } 245 Appendices sub printPapers { my ($Papers) = @_ ; my $i = 0, $label, $person, $aPaper = {} ; foreach (@{$Papers}) { # Print the (internal) name of the ?Paper object... $aPaper = $Papers->[$i] ; $label = $aPaper->name($main: :Locus) ,-print "Paper : \"[$label]\"\n" ,-# then print out the fields foreach $tag (sort keys %{$aPaper}) { if($tag eq "Authors") { $AuthorList = $aPaper->{$tag} ; foreach $person (@{$AuthorList)) { print "Author \"$person\"\n" ; } } else { print "$tag \"$aPaper->{$tag}\"\n" ; } } print "\n" ; $i++ ; } } package main ; $ Comment = "" while(<>) { chop ; next if $_ eq "" ; if(/AL0CUS\s+(.+)$/) { $1 = ~ /(\d+)\s+(\d+)\s+aa\s+(\d{l,2})-(\w{3})-(\d{4})\s*$/ ; $Locus = "$1" ; printf "Protein : \"gi%07d\"\n", $Locus ; print "DB_searched \"Genbank\" \"$Locus\"\n" ; print "Date \"$5-$month{$4}-$3\"\n" ,-} elsif (/ADEFINITION\s+/) { $Definition = $1 ; while(<>) { last if ( r w 11 r\s{2}w) ; Chop ; /A\s*(\S)/ ; $Definition .= "\\\n $1$'" ; } print "DB_definition \"$Definition\"\n" ; } elsif (/AACCESSION\s+/) { next ; # ignore } elsif (/APID\s+/) { next ; # ignore } elsif (/ADBSOURCE\s+(.+)$/) { $1 =- / (\w+) \ :\slocus\s (\w+) (\, \saccession\s (\w+) ) ?/ print "Database \"$l\"" ; print " \"$2\"" if ($2 ne '"') ; print " \"$4\"" if ($4 ne "") ; print "\n" ; } elsif (/AKEYWORDS\s+/) { $KW = $' ; while(<>) { last if ( /A\w/ || /A\s{2}\w/) ; Chop ; /A\S*(\S)/ ; $KW . = " $1$ 1 11 ; } while($KW =- / \ S * ( \ W [ A \ ; \ . ] + ) [ ; . ] / ) { print "Keyword \"$l\"\n" $KW = $' ; } } elsif (/ASOURCE\s+(.+)$/) { next ; # ignore } elsif (/\s{2.}0RGANISM\s+(\w.+\w)\s*$/) { 246 Appendices i f ( $ l ) { p r i n t "Species \ " $ l \ " \ n " ; } } e l s i f (/AREFERENCE\s+/) { $Papers = PaperClass->parsePapers($Locus) ; $ i = 0 ; foreach (@{$Papers}> { # Add the ?Paper object reference ( i n t e r n a l name) to the ?Protein record $aPaper= $Papers->[$i] ,-$label = $aPaper->name ($Locus) ,-p r i n t "Reference \" [$label]\"\n" ; $i++ ; } redo ; } e l s i f (/ACOMMENT\s+/) { $Comment = "$'\n" ; $Comment =- / A \ s * ( \ S ) / ; while(<>) { l a s t i f ( /A\w/ || /A\s{2}\w/) ; / A\s*(\S.+)/ ; $Comment .= " $1$'" ,-} p r i n t f "DB_comment \"[gi%07d]\"\n", $Locus ; redo ; } e l s i f (/AFEATURES\s +(.+)$/) { next ; # ignore } e l s i f (/AORIGIN\s+(.+)$/) { next ; # ignore } e l s i f ( / A \ A / / ) { p r i n t "\n" ; $Papers->printPapers ; $Papers = 0 ; if($Comment ne "") { p r i n t f "LongText \"[gi%07d]\"\n", $Locus ; p r i n t $Comment ; p r i n t "***LongTextEnd***\n\n" ; $Comment = "" ; # c l e a r . . . } } e l s e { next ; # ignore anything else? } 247 Appendices F. ClustalW (1.7) Multiple Sequence Alignment of WRN-related Helicases (Canonical helicase domains are annotated e.g. 4- Domain I ->) gi0096321 gi0461924 g i l l 6 9 2 6 1 gil706338 gil001719 gi2500540 gi2648271 g i l l 7 7 0 1 0 gi2621248 -; gi0266336 gi0417180 gi0050823 gil708418 gi0446778 —' gi2150025 gi0729821 gi2443810 gi2773184 g i l l 7 0 5 0 7 -gil352438 gi0124218 gi2500523 gi0729329 g i l l 7 4 4 5 6 gil709532 MSTARTENP VIMGLSSQNGQLR 22 gi2745894 gi2136088 MGLSSQNGQLR 11 gil709533 MSTARTENP VLMGMSSQNGQLR 22 gi3047117' QSRSGYQQHPPPQYVQRGNYAQNHQQQ 27 gi0132530 gi0421132 : gi0003641 FFGFSKER 8 gi0118411 FSNNRR GGY 9 gi2500528 DRGR GDY 7 gi3023628 . DRGR SDY 7 gi0130256 ERGR SDY 7 gi2580554 DDRGR SDY 8 gi0113825 DRRQ DGF 7 gi2558533 FFNDRGS S—.8 gi0129383 MSGYSSDR DRGR 12 gi2500527 MSSYSSDR DRGR 12 gil592565 MRGGGFGDR DRDR 13 gi0133134 -GDYHGIRNGAVEKRRD 16 gi0118284 . RGGDFRGGR NS 11 gi l l 6 9 2 2 8 RPVPSSHAVSAPAHKSSV 18 gil280208 YKSTEHLSPNDNENDTSYVIESDE- 24 gi2130973 RTCVI PS ISENELQDLEQQAKEEK 24 gi2619051 gi2851488 gil705486 ETQPSYDIDNFDIDDF 16 gi2276199 : DLDEEPPIVDLDDS-FDNF 18 gil l 7 5 4 8 4 PFPSTIIPESTVKENSTRPYVNSHLVANDKITATPFHSEAVVSPLQSNIRNSDIAEFDEF 60 gi2128837 KSYNLKFDYIELCPFCLLKNIYKR 24 gi2621738 CQHCLHEGYLTV 12 g i 1666893 LQQLQHRVQLKYAA 14 248 Appendices gil666897 gi0005022 gi0131812 gi2495145 gi2495146 gi0119540 gi2134009 gi2058510 g i l l 6 9 3 4 5 gi3183486 gi2995310 gil706438 gi0281858 gi0281834 gi0544162 g i l l 6 6 5 0 4 gil931649 gi2072674 gi0464912 gi0861396 gil710074 gil363325 gi0130806 gil708151 gil066920 gi2500113 gi0790392 g i l l 7 3 1 2 1 gil418571 gi3036880 gi2408082 gi0116351 gi2642224 gil706437 gil752904 gi3257105 gil730960 gi2131417 gi2983030 gi3217395 gi3183240 gi2622454 MKFYIDDLPIL 11 MKFYIDDLPVL 11 MKLNVDGLLVY 11 . MKLNVDGLLVY 11 MKLNVDGLLVY 11 MKLNIEGLLVY 11 MKFYIEDLLVY 11 MDYENQIANIFSLNGELS 18 MTDDFAPDGQLA 12 MTKPSLPELLHAA 13 —MSESVSMSVPELLAIA 16 •MALTAALKAQIAAWYKALQ 19 SPFTG 5 • — DVLLAISNELLDDATDL 17 EHIDLLEDDLEKDAILDDSMSFSFGRQHMPMSHSDLELI 39 DQGDDCEFIPACDETQEVP 19 KKG 3 •—VVDEEVIEKKKSK 13 •VSAIEKAKKDSEKER 15 TEILEQIQSRERLQ 14 G i KKKELSTITEE 11 —RLEEIRKR 8 DPLRKKRKGA 10 RFSSFSVREPEA 12 PPKASNSTEVWINPKL 16 MEEL 4 FSLTKTKNFYEE 12 -FKSRLREFILEKEKA 15 MDSISFYEY 9 KEKCEIHGFS 10 MEFKGYIK 8 MENPLFCPDCGM 12 249 Appendices gi0096321 MMS 3 gi0461924 MS 2 g i l l 6 9 2 6 1 gil706338 gil001719 gi2500540 gi2648271 g i l l 7 7 0 1 0 gi2621248 gi0266336 G 1 gi0417180 MSASQDSRSRDNG 13 gi0050823 MSGGSADYNREHGG 14 gil708418 gi0446778 MDDRNEIPQDG 11 gi2150025 MTNSEQNPPSEEN 13 gi0729821 MATTATMATSGSARKRLLKEE 21 gi2443810 MAAAAVAGVAG LTTAHAKRLLREE 24 gi2773184 MAGKKAEKD 9 g i l l 7 0 5 0 7 MEED-. 4 gil352438 MADE 4 gi0124218 MSEG 4 gi2500523 MASEG 5 gi0729329 • MGSINNNFNTNN NSNTDLD R--D-WK 23 g i l l 7 4 4 5 6 MAESLIQKLE -NANLNDR E—S-FK 21 gil709532 GPVKASAGPGGGGTQPQPQLNQLK NTSTINNGT—PQQAQSMAATIKPGDDWK 73 gi2745894 KASAGPGGGGTQTQQQMNQLK NTSTINNGT—PQQAQSMAATIKPGDDWK 48 gi2136088 GPVKPTGGPGGGGTQTQQQMNQLK NTNTINNGT—QQQAQSMTTTIKPGDDWK 62 gil709533 GPLKPSAGPGGGGTQTQ-QINQLK NASTINSGS--QQQAQSMSSIIKPGDDWK 7 2 gi3047117 FQQAPSQ-PHQYQQQQQQQQQWLRRGQIPGGNSNGDAVVE--VEKTVQSEVIDPNSEDWK 8 4 gi0132530 gi0421132 gi0003641 -NGGTSANYNRRGSSNYKSSGNRWVNG KHIPGPKN AKLQKAELFGVHDDPDY 59 gi0118411 GNGGFFGGNNGGSRSNGRS-GGRWIDG KHVPAPRN EK-AEIAIFGVPEDPNF 59 gi2500528 DGIGGRGDRSGFGKFERGG-NSRWCD KSDEDDWS KPLPPSERLEQELFSGG 57 gi3023628 DGIGSRGDRSGFGKFERGG-NSRWCD KSDEDDWS KPLPPSERLEQELFSGG 57 gi0130256 ESVGSRGGRSGFGKFERGG-NSRWCD KADEDDWS KPLPPSERLEQELFSGG 57 gi2580554 DGIGNR-ERPGFGRFERSG-HSRWCD KSVEDDWS KPLPPSERLEQELFSGG 57 gi0113825 DGMGNRSDKSGFGRFDR-G-NSRWSDD RNDEDDWS KPLAPNDRVEQELFSGS 57 gi2558533 SR-GRYERGGFGGGG-NSRWVEE-—CRDED-WS KPLPPNERLEHELFSGS 53 gi012 9383 DRG-FGAPRFGGSRAGPLSGKKFGNPG--EKLVKKKWN LDELPKFEKNFYQEHPDL 65 gi2500527 DRG-FGAPRFGGSRTGPLSGKKFGNPG--EKLVKKKWN LDELPKFEKNFYQEHPDL 65 gi1592565 DRGGFGARGGGG LPPKKFGNPG—ERLRKKKWD LSELPKFEKNFYVEHPEV 62 gi0133134 DRG--GGNRFGGG--GGFGDRRGGGGGGSQDLPMRPVD FSNLAPFKKNFYQEHPNV 68 gi0118284 DRNSYNDRPQGGNYRGGFGGRSNYNQP—QELIKPNWDEE—LPKLPTFEKNFYVEHESV 67 g i l l 6 9 2 2 8 FVSSSVEKPSQGQRYDADGGHNRGSNN KIARSSSD RFHDGTSVHEGYGSL 68 gil280208 —DLEMEMLKHLSPNDNEND-TSYVIESDEDLEMEMLKSL--ENLNSGTVEPTHSKCLKM 7 9 gi2130973 YNDVSHQLSEHLSPNDDEND-SSYIIESDEDLEMEMLKSL—ENLNSDMVEPTHSKWLEM 81 gi2619051 • gi2851488 gi1705486 DDDDDWEDIMH-NLAASKSSTAAYQPIK--EGRPIKSVS ERLSSAKTDCLPVSSTAQ 7 0 gi227 6199 HVGSTSEEVVSGDIAPEEEEEEGHDSFDDFESVPAQPPSK--NTLASLQKSDSEIALNQQ 7 6 g i l l 7 5 4 84 DIDDADFTFNTTDPINDESGASSDVVVIDDEEDDIENRPL—NQALKASKAAVSNASLLQ 118 gi2128837 LTRNNRCRYGNLEICINCGINEIKEEVKIS EEFI—EKFLKRFKDVDKVLSLLR 7 6 gi2 621738 VSSRSSTVHAGQIICSRCVDELIKRELKFAGMDGSTFRNF—RRLIRRGVSLDKILEMMS 70 gi1666893 KRLRQEEEE—RENLLRLSREMLETGPEAERLEQLESGEEELVLAEYESDEEKKVASRVD 72 gil666897 gi0005022 FPYPRIYPE--QYQYMCDLKHSLDAG—GIALLEMPSG TGKTISLLSLIV 57 gi0131812 FPYPKIYPE--QYNYMCDIKKTLDVG--GNSILEMPSG TGKTVSLLSLTI 57 gi24 9514 5 FPYDYIYPE--QFSYMLELKRTLDAK--GHGVLEMPSG TGKTVSLLALIV 57 gi24 9514 6 FPYDYIYPE—QFSYMLELKRTLDAK—GHGVLEMPSG TGKTVSLLALIV 57 250 Appendices gi0119540 gi2134009 gi2058510 g i l l 6 9 3 4 5 gi3183486 gi2995310 gil706438 gi0281858 gi0281834 gi0544162 g i l l 6 6 5 0 4 gil931649 gi2072674 gi0464912 gi0861396 gil710074 gil363325 gi0130806 gil708151 gil066920 gi2500113 gi0790392 g i l l 7 3 1 2 1 gil418571 gi3036880 gi2408082 gi0116351 gi2642224 gil706437 gil752904 gi3257105 gil730960 gi2131417 gi2983030 gi3217395 gi3183240 gi2622454 FPYDYIYPE—QFSYMRELKRTLDAK—GHGVLEMPSG TGKTVSLLALIM 57 FPYDYIYPE--QYSYMLELKRTLDAK--GHGVLEMPSG TGKTISLLSLIV 57 FPYSYIYPE--QYSYMVALKRSLDNG--GPCILEMPSG TGKTVSLLSLIS 57 QNIKGFRPRAEQLEMAYAVGKAIQN--KSSLVIEAGTG TGKTFAYLAP-- 64 KAIPGFKPREPQRQMAVAVTQAIEK—GQPLVVEAGTG TGKTYAYLAP— 58 VTAVGGTERPGQVAMAEAVEEAIDG--GSHLLVQAGTG TGKSLGYLVP-- 59 VAALGGTRRRGQQEMAAAVAHAFET—GEHLVVQAGTG TGKSLAYLVPAI 64 EQIPDFIPRAPQRQMIADVAKTLAGEEGRHLAIEAPTG VGKTLSYLIPGI 69 RQGQYNQGYNGGGRRDSRGGMGERIKPVDWGNVSLVPGNWKVLDGKAIKKAGEIKTSTPE 65 SPDRVGQLRQERLRLKKQIQQLENHIR--DKESQ-KSQ FLSSTATRIFQYETPK 68 DSEKENEDFEEDNNNNGIEYLSDSDLERFDEERENRTQVADIQELDNDLKIITERKLTGD 9 9 KIKRGYTLRTRASVKNKCDDSWDDGIDEEDVSKRSEDTLNDSFVDPEFMDSVLDNQLTIK 7 9 ELMENDQDAMEYSSEEEEVDLQTALTGYQTKQRKLLEPVDHGKIEYEPFRKNFYVEVPEL 63 EDQLFELGGTDDEDVEDNTDNSNIAKIAKLKAKKRVKQIYYSPEELEPFQKNFYIESETV 60 KHKKDKKDKKEKKDKKHKKHKKEKKGEKEVEVPEKESEKK—PEPTSAVASEFYVQSEAL 71 QRQKLREAIMERKRREALMNEPKKQKIEPVSTVKLEKSVNQKISANSEDFDVSGPSNSRE 7 5 SRIDQARRGMVEVSRKRKAPARDTDQFLEPQDEAAPSEE YNNDEKSEKQRDSDFFDD 71 NDENHKEDNNESQIVKELDFFRNKRIISKVEDDREKTTENDSPNKEEKSGNDDGLIKPVI 61 SLRRYDELREKMIGKVESKIEKKVKKNKKKASIEILASRKRVHDSDGDSENEVEETVGDV 71 NQSRKNQMSNNSTTYHRETKRRNINAEASTSDNCNNSNTSVDPMDEYLVTAEYTMPSTSE 68 RHLDVSLEEQDFIPRPYESDSENNDTSKSTRGGRISDKD YKLSELNSQIITLLDKI 66 IDVRINEDENFSFEIESWEAGNEKALSELMPGYEKRDGQMMMMREVADAFANREHALIEA 72 SVKFPFEPYECQRIFMKNVVDVLDRK--LDAALESPTG TGKTLSLLCSTL 64 QYFPYEKFRPNQREFIEIVKEAVKRG—ENLIVEAPTG FGKTISVL 4 8 LNNWIGDVFYDILPEKGFDLRDEQVFMAFQLERAFKEK KVMFAEAGVGTGKTLVYL 68 NLDPFSELTNLAQKYIPRERDYEDPIEAMMKAKQESNE MSIPNYSNNSVITTIPQM 71 LLTKGFEERKSQREMIKIVEEALEEG GVKLIEA 4 2 GKTQVVFTNDIGIPLESKVYEEPPEEEEFIEPVPVKND KLFEAQWEHDFTPH 62 EKFP YPKVRE PQKRMMLKIYECIKNK RN LIVE A 41 MRDNCTCKGKGRLSFFRNIIRPPEKSDTLNEDLQRRYP HIPPEIIENFPF 62 251 Appendices gi0096321 gi0461924 g i l l 6 9 2 6 1 gil706338 gil001719 gi2500540 gi2648271 g i l l 7 7 0 1 0 gi2621248 gi0266336 gi0417180 gi0050823 gil708418 gi0446778 gi2150025 gi0729821 gi2443810 gi2773184 g i l l 7 0 5 0 7 gil352438 gi0124218 gi2500523 gi0729329 g i l l 7 4 4 5 6 gil709532 gi2745894 gi2136088 gil709533 gi3047117 gi0132530 gi0421132 gi0003641 gi0118411 gi2500528 gi3023628 gi0130256 gi2580554 gi0113825 gi2558533 gi0129383 gi2500527 gil592565 gi0133134 gi0118284 g i l l 6 9 2 2 8 gil280208 gi2130973 gi2619051 gi2851488 gil705486 gi2276199 g i l l 7 5 4 8 4 gi2128837 gi2621738 gil666893 gil666897 gi0005022 gi0131812 gi2495145 gi2495146 YVDWPPLILRHTYYMAEFETT-YVDWPPLILRHTYYMAEFETT-MTDKIT-MAFPEYSPAASAAT-MTNTLTST-MEVEYMN--FADLG--FADLG--FNDLG--FADLQ--FADLG--FNELN-MKEVNKVE FEDLG-M S H --FKNYQ-MKGLE FSEFD-PDGMEPEGVIESNWNEIVDS FDDMN-PDGMEPEGVIESNWNEIVDS FDDMN-PEGMDPDGVIESNWNEIVDN FDDMN-M VDQLEDSVIETNYDE VI DT FDDMN-PASMEPEGVIESTWHEVYDN FDDMN--VQVASTGEIESNYDEIVEC FEALN-— DMTKVEFETSEEVDVTPT FDTMG---DMTTVEFQTSEEVDVTPT FDTMG-— DMATVEFESSEEVNVI PT FDKMG-RLVFET-SKGVEPIAS FAEMG-— IMENVELTTSEDVNAVSS FEEMN--ITDIEESQIQTNYDKVVYK FDDME--ITEIDSGLIETNYDNVVYK FDDLN-TALNIPKKDTRPQTDDVLNTK GNTFEDFY-GQMKAQPVDMRPKTEDVTKTR GTEFEDYY-KTLKLPPKDLRIKTSDVTSTK GNEFEDYC-KTLKLPPKDLRIKTSDVTSTK GNEFEDYC-KTLKLPPKDLRIKTSDVTSTK GNEFEDYC-KTLKLPPKDLRIKTSDVTSTK GNEFEDYC-ARLKLPAPDTRYRTEDVTATK GNEFEDYF-MSKTHLTEQK . FSDFA-MSKTHLTEQK -FSDFA-H-SSGIKFDNY-DNIPVDASG-KDVPE-PILDFSSPP-Q-SSGINFDNY-DDIPVDASG-KDVPE-PITEFTSPP-—NTGINFEKY-DDIPVEATG-NNCPP-HIESFSDVE-—NTGINFEKY-DDIPVEATG-NNCPP-HIESFSDVE---NTGINFEKY-DDIPVEATG-NNCPP-HIESFSDVE-—NTGINFEKY-DDIPVEATG-SNCPP-HIENFSDID-—NTGINFEKY-DDIPVEATG-SNCPP-HIESFHDVT---NTGINFEKY-DDIPVEATG-HNGPQ-PIDSFHDLE-ARRTAQEVETYRRSKEITVRG-HNCPK-PVLNFYEAN-ARRTAQEVDTYRRSKEITVRG-HNCPK-PVLNFYEAN-ARLTPYEVDELRRKKEITVRGGDVCPK-PVFAFHHAN-ANRSPYEVQRYREEQEITVRG—QVPN-PIQDFSEVH-RDRSDSEIAQFRKENEMTISG-HDIPK-PITTFDEAG-GVGSDISQESYCRRNEISVTG-GDVPA-PLTSFEATG-ERNLGLPTKEEEEDDENEANEG-EEDD-DKDFLWPAP-GTNGCLPP-EEEDGHGNEAIK—EEQE-EEDHLLPEP-M L H . _ M A . NINFSESIQNYTDKSAQNLASRNLKHE-RHDMHGRFRGFLQDDSEEFS D-SSSLDRPLLGEMKDKNHKVLMPS L-IR-NPLDKPELTRYDIITGSEE—DKI-PRFDPLENHELTRYDTVT-SES--ERT-EDEDDLEEEHITKIYYCSRTH QAEVLNLES-RFQSLSFPH-•EVGLLGADM-•DDPMLSYPW-•ENYKIDELD-•PRVPLDKLA-SQLAQF--LKAPIL ; -LKAPIL -LPEFIL -IHPRVL -LSEKRC -LSDNIL -ISENGL -ISHDIL -ISGDIN -LSESLL -LSESLL -LKESLL -LKPELL -LREELL -LEGDLL -LREDLL -LREDLL -LREDLL -IKDDLL -LKEDLL -LDENLL -LKPNIV -LKRELL -LKRELL -LKRELL -LKRELL -LKRELL -LKRELL -LKRELL-— -LHPKVV -LHPKVV -LDELLM -LDGLLL -MGEIIM -MGEIIM -MGEIIM -MGEIIM -MGEIIM -MGEIIM -FPANVM -FPANVM -FPQYVM -LPDYVM -FPDYVL -FPSEIV -NEEQVT -NAKQIN -RAQSL -GAKQV -TKEMMK -NKELYD -SKEVLG -IPEELK -VPEKFK -VHEVKKSPF- -GK 35 34 17 25 19 18 19 14 16 32 44 45 31 42 43 50 53 38 30 33 34 35 58 56 108 83 97 107 119 21 21 98 98 95 95 95 95 95 91 106 106 104 108 108 109 120 120 8 16 112 112 157 115 109 110 SYQQHYP-E-HRKLIYCSRTM SEIDKA—LAELKR LM 90 AYQMHYP-E-HRKIIYCSRTM SEIEKA—LVELEN LM 90 AYQRAFPLE-VTKLIYCSRTV PEIEKV—IEELRK LL 91 AYQRAYPLE-VTKLIYCSRTV PEIEKV—IEELRK LL 91 252 Appendices gi0119540 AYQRAYPLE-VTKLIYCSRTV PEIEKV—IEELRK LL 91 gi2134009 AYQRAFPLD-VTKLIYCSRTV PEIEKV—VEELRK LM 91 gi2058510 SYQVKNP SIKLIYCSRTV PEIEQA—TEEARR VL 89 g i l l 6 9 3 4 5 —ALVFG KKTIISTGSK —NLQDQL—FNRDLP A l 93 gi3183486 —ALRAK KKVIISTGSK ALQDQL—YSRDLP TV 87 gi2995310 --ALAHG ERVVVATATL -ALQRQL—VERDLP RT 88 gil706438 IRALCDD APVVVSTATI ALQRQL--VDRDLP QL 95 gi0281858 MVSTANV ALQDQI—YSKDLP LL 21 gi0281834 VVSTANV ALQDQI—YSKDLP LL 21 gi0544162 AIAREEQ KTLVVSTANV ALQDQI--YSKDLP LL 100 gi l l 6 6 5 0 4 AGQLSEEEATKWREEHVITIFGDDCPP-PMSSFDHLCGIVPPYLL 109 gil931649 STNYKMDQPQTDFRAHVSDQG-RYACD-SWNTPRDSSFSVDRYVQVN 113 gi2072674 MTAVKHTTEST FAKLG—VRDEIV 22 gi04 64 912 NEHPPPSWSPKIKREKSSVSQKDEEDD-FDDDFSLSDIVSKSNLSSKTNGPTYPWSDEVL 158 gi0861396 GKKQFLDDGEFFTDRNVPQID EA-TKMKWASMTSPPQEALN 119 gil710074 : MTKLQQ 6 gil363325 AKMSQEEVNVFRLEMEGITVKGKGCPK-PIKSWVQCG—ISMKIL 105 gi0130806 SSMSEMEVEELRLSLDNIKIKGTGCPK-PVTKWSQLG—LSTDTM 102 gil708151 TSLPQSDIDEYFKENEIAVEDSLDLALRPLLSEDYLS--LDSSIQ 114 gil066920 TQESPLDPNDFVANPLAIGTG ERLIRGQDIIERRDK 111 gi2500113 MADRQS - LEEALR 12 gi0790392 VDEEEEKPLKCLKIFYASRTHS QLEQLAEELAKTRFQP 109 g i l l 7 3 1 2 1 TNTVEASALRKSYKGNVSGIDIP LPIGSFEDLISRFSFDKR L 103 gil418571 EKRMQKLLDDIRRTNRIFTWG—DHLPNIILRFSDSS—MSPSLL 112 g i 3036880 MDHVELRTEADAVLA 15 gi2408082 QSEDLFNNGYSSKVSELLRKLS PDNEKPP—IVQKIY 103 gi0116351 DGKVSRDPNNGDRFDVTNQNP VKIYYASRTYSQLG 101 gi2642224 NSIKSVQHR ERPK-VTAEDLG 20 gil706437 PPGIGKTIGYLIPAALFAKKS KKPVIISTYSTLLQQQILT K 113 gil752904 AWVQRQKETKPLDFATWQTSG AGGAEK—TDEKLK ' 97 gi3257105 AGVLPYAISLGYKVVYLARTH KQMDR--VIEELR 80 gil730960 LFAISYARYVGKPAIIACADE TLIEQL—VKKEGD 101 gi2131417 IEKLKSTEFYASQIKHCFTIP SRTAKYKGLCFELAP 107 gi2983030 PTGTGKTFAYLIPIIVNKQKA IISTGTKILQDQLK 77 gi3217395 TPGETPPVTLKEELEPVKKPE PVETKCTCLPRVRIYYG TRT 103 gi3183240 PTGVGKTLGYLIPALYFAERR KRVLI--LTETID 73 gi2622454 PQPRPGQLDIINDIYQAIEEG YRYVI—LEAGTG T 95 253 Appendices gi0096321 gi0461924 g i l l 6 9 2 6 1 g i l 7 0 6 3 3 8 g i l 0 0 1 7 1 9 gi2500540 gi2648271 g i l l 7 7 0 1 0 gi2621248 gi0266336 gi0417180 gi0050823 g i l 7 0 8 4 1 8 gi0446778 gi2150025 gi0729821 gi2443810 gi2773184 g i l l 7 0 5 0 7 g i l 3 5 2 4 3 8 gi0124218 gi2500523 gi0729329 g i l l 7 4 4 5 6 g i l 7 0 9 5 3 2 gi2745894 gi2136088 g i l 7 0 9 5 3 3 gi3047117 gi0132530 gi0421132 gi0003641 gi0118411 gi2500528 gi3023628 gi0130256 gi2580554 gi0113825 gi2558533 gi0129383 gi2500527 g i l 5 9 2 5 6 5 gi0133134 gi0118284 g i l l 6 9 2 2 8 g i l 2 8 0 2 0 8 gi2130973 gi2619051 gi2851488 g i l 7 0 5 4 8 6 gi2276199 g i l l 7 5 4 8 4 gi2128837 gi2621738 g i l 6 6 6 8 9 3 g i l 6 6 6 8 9 7 gi0005022 gi0131812 gi2495145 gi2495146 -EALNDLGYEKP--EALTDLGYEKP--KAVSDLGFETP--RAIGDVGYESP--QLLADIGFEAP--NAIRNKGFEKP--NAVRRKGFEKP--RALEGLGYTEP--RALDDMGFEST--RGIYAYGFEKP--RGIYAYGFEKP--RGIYAYGFEKP--RGIYAYGFERP--RGIYGYGFEKP--RGIFAYGFEKP--RGIYAYGFEKP--RGIYAYGFEKP--RGIYAYGFEKP--RGVYQYGFEKP--RGIYAYGYETP--RGVFGYGFEEP--RGIFGYGYETP--MGIFEAGFEKP--MGIFEAGFERP--MGIFEMGWEKP--MGIFEMGWEKP--MGIFEMGWEKP--MGIFEMGWEKP--MGIYEKGFERP--EALEKKGFHNC--EALEKKGFHNC--ENIKLASFTKP--ENIKLARFTKP--GNIELTRYTRP--GNIELTRYTRP--GNIELTRYTRP--GNIELTRYTRP--GNIQLTRYTRP--GNINLSRYTRP--DVIARQNFTEP--DVIARHNFTEP--DVLMDQHFTEP--KEIRRQGYKAP--NEVKAEGFDKP--REMHQAGFSAP--CLKMYFGHSSF--CLKTYFGHSSF-—LAHYFGYEKF---LQETFGYQQF--IFHKKFGLHNF--TLKSKFGFNQF--CLKHKFHLKGF--EIIKSRGIEEL--RMLKREGNTVL-DVRLVSLGSRQN-AYRTSQLGYEEP-DYRTKELGYQED-SFYEQQEGEKLP-SFYEQQEGEKLP-SPIQAECIPHLLNGR-SPIQAECIPHLLDGR-SPIQQSCIPHLLNGN-TAlQAATIPALMAGS-TQIQTEAIPLLLSGR-TDIQMKVIPLFLNDE-TEIQREIIPRFFEGE-TKVQQSVIPAALERK-TPIQALTLPVTLDGM-SAIQQRAILPCIKGY-SAIQQRAILPCIKGY-SAIQQRAIIPCIKGY--SAIQQRAIMPILGER— SAIQQRAIIPCVRGR— SAIQQRGIKPILDGY--SAIQQRAIKQIIKGR--SAIQQKAIKQIIKGR— SAIQQRAVPAILKAR--SAIQQRAVLPIISGR— SAVQSRAIIQICKGR— SAIQQRAIMPIIEGH— SAIQQRAILPITEGR— SPIQEEAIPVAITGR— SPIQEESIPIALSGR-SPIQEESIPIALSGR-SPIQEESIPIALSGR-SPIQEESIPIALSGR-SPIQEESIPIALSGR-•SPIQEESIPIALTGR-•TPIQALALPLTLAGR-•TPIQALALPLTLAGR-•TPVQKYSIPIVTKGR— TPVQKYSVPIVANGR— • T PVQKHAIP11KEKR--•T PVQKHAIP11 KEKR— •T PVQKHAI P11 KEKR- -• T PVQKHAIP11KGKR--• T PVQKHAIP111EKR— • T PVQKHAI P11KS K R — • TAlQAQGWPVALSGL— • TAlQAQGWPVALSGL— •TPIQCQGFPLALSGR— •TAIQAQGWPIAMSGS--•TGIQCQGWPMALSGR--•TPIQAQSWPIALQGR— •KPVQWKVIHSVLEER— •KPVQWKVIHSVLEER--•RSGQDEAIRLVTEAR— •RPGQEEIIDTVLSGR--•RTNQLEAINAALLGE— •RHRQKQCILSTLMGH— •RKNQLEAINGTLSGK— -LPVQTLSVKAGLLNG— •RPVQVLAVDAGLLEG--—LCVNEDV-KSLGSV— E E E — •FLGLGLTSRKNLCLH— •FRGLGLTSRKNLCLH— - FLGLALS SRKNLCIH — -FLGLALSSRKNLCIH--254 — - DVLGMAQTGS GKTA 75 — - DVLGMAQTGS GKTA 7 4 --DVLGMAQTGS G K T A — 57 — - DVVGLAQTGT GKTA 65 — DMLAQSQTGT GKTA 59 — YNIVAQ ART G S GKTA 59 -KDMVGQAQTGS GKTA 60 —DLVVKSQTGS GKTA 54 — DVVGEAQTGT GKTA 56 --DVIAQAQSGT GKTA 72 --DVIAQAQSGT GKTA 84 --DVIAQAQSGT GKTA 85 — DVLAQAQSGT GKTA 71 — DVIAQAQSGT GKTA 82 --DTIGQAQSGT GKTA 83 --DVIAQSQSGT GKTA 90 --DVIAQSQSGT GKTA 93 — DVIAQAQSGT GKTA 7 8 — DVIAQAQSGT GKTS 7 0 -DVIAQAQSGT GKTA 73 -DVLAQAQSGT GKTG 7 4 -DVLAQAQSGT GKTA 7 5 — DILARAKNGT GKTA 98 — DILARAKNGT GKTA 96 — DILARAKNGT GKSG 14 8 — DILARAKNGT GKSG 123 — DILARAKNGT GKSG 137 --DILARAKNGT GKTG 14 7 — DILARAKNGT GKTA 159 — DVAGQAQTGT GKTM 61 — DVAGQAQTGT GKTM 61 — DLMACAQTGS GKTG 138 — DLMACAQTGS GKTG 138 — DLMACAQTGS GKTA 135 --DLMACAQTGS GKTA 135 --DLMACAQTGS--- GKTA 135 — - DLVACAQTGS GKTA 135 — DLMACAQTGS GKTA 135 --DLMACAQTGS GKTA 131 —DMVGVAQTGS : GKTL 14 6 —DMVGVAQTGS GKTL 14 6 — DMVGIAQTGS GKTL 14 4 --NFVGIAKTGS GKTL 14 8 --DMVGIAATGS GKTL 14 8 --DIVAIAKTGS GKTL 14 9 -RDNVAVMATGY GKSL 161 -RDNVVVMATGY GKSL 161 -QNTACIMPTGG GKSI 4 8 -D-CLVVMPTGG GKSL 55 -D-CFILMPTGG GKSL 152 -D-TFVLMPTGA GKSL 152 -D-VFILMPTGG GKSL 197 -DDLLIISATSS GKTL 156 -EDLMVVSATAS GKTL 150 —QLINDRCVDM QRSRHEK—K 153 --PLWQGCSAGL PWLPAEK—K 21 — PSVRREKNGN • VVDARCR—S 135 — PEVSKERKGT VVDEKCR--R 135 — PEVTPLRFGK DVDGKCH—S 136 — PEVTPLRFGK DVDGKCH—S 136 Appendices gi0119540 gi2134009 gi2058510 gill69345 gi3183486 gi2995310 gil706438 gi0281858 gi0281834 gi0544162 gill66504 gil931649 gi2072674 gi0464912 gi0861396 gil710074 gil363325 gi0130806 gil708151 gil066920 gi2500113 gi0790392 gill73121 gil418571 gi3036880 gi2408082 gi0116351 gi2642224 gil706437 gil752904 gi3257105 gil730960 gi2131417 gi2983030 gi3217395 gi3183240 gi2622454 NFYEKQEGEKLP-FLGLALSSRKNLCIH-EFYAKETGEVNN-FLALSLSSRKNLCIH-QYRNSEMGEESPKTLCMSMSSRRNLCIQ-KKALNFTG KIALLKGRANYLCLE-SKALKYTG NVALLKGRSNYLCLE-VDALHPQLRRRP-EFAMLKGRSNYLCLH-VDSLTNALPRRP-KFALLKGRRNYLCLN-KKIIPDLK FTAAFGRGRYVCPR-KKIIPDLK FTAAFGRGRYVCPR-KKIIPDLK FTAAFGRGRYVCPR--KKLTAQNFTAP-TPVQAQSWPVLLSGR-—NKKVFGNHSF-RPNQREIINATMSGS--RALGEEGIKRP-FAIQELTLPLALDGE-YRLHEVFKLPGF-RPNQLEAVNATLQGK--ALNEFFGHKGF-REKQWDVVRNVLGGK--TLYQFFGFTSF-KKGQQDIIESILSGK--NSLKKHGYEKP-TPIQTQAIPAIMSGR--VLITEKLHFGSLTPIQSQALPAIMSGR-AEISKFPKP-TPIQAVAWPYLLSGK--VFLELFCHKKYRSRLQMQAINCILKRK-RIWGYDHF-RYPQGEVIDCLLARR--RIVTCASRGTLCVNEEVKKLKLNHLIN-LNNLIENGFTEP-TPIQCECIPVALNNR--NRLSENSIRQP-SPIQMQSIPFMTERR--ELVGDREGSARLREDQWQAVAALVEE H --FT S RTH SQLQQLVQEIKKLNNQT FS T P--QFTSQLRLPSFPSSFRDKVPDEKVKYL---PEVTPLRFGK---PEVSSLRFGK-—PRVSEERDGK-—RLDQVIAQG--—RLEQQALAG--—RLHEGVPQDE-—KIHNSVTASD-DVDGKCH—S EVDGKCH--S VVDALCR—E VLGD GDLP -EEGLFDQFEAAAP—T -HDDERPQEELFDPVAV —NLTALASTEPTQQDLLAFLDDELTP—N —NLTALASTEPTQQDLLAFLDDELTP—N —NLTALASTEPTQQDLLAFLDDELTP--N —DL VG VAKT G S GKTL — DVFVLMPTGG GKSL — DVIGQARTGM-- GKTF --DVFVLMPTGG GKSL --DQFVLMSTGY GKSV --DTIAMLPTGG GKSL --DLIGIAKTGS GKTI --DVIGISKTGS GKTI — - DVVGVAETGS GKT F -CDVYVSLPTGA GKSL — DCLVVLPTGG GKSI -EKCMELRKNGMS EKEK —DVLACGPTGS -—GKTL —NVLASAPTGS GKTL -RRALVVQRTGW GKSA -IRVVSLASRKN LCINNEVRKLR -PLASKKQLCIN PKVMKWK AVARKLYCGDLQFRPGQRRAMLAIMGRRQAE-QVVVVMPTGA-DLPIVQDLFPFPVTAAILKGQSHYLCLY -SAYVPTIFYASRTHSQLEQVVHELNRT KIGEKSE-VSGIEFRSRKDLCLH---ISKLAEHLDLKIDTRLSKSHEQYLCLK— -EVYQGMEHENF-YSHQADAINSLHQGE— -RDIEFLAGHWKLLTGQKVNYAVIKGKG— HKQIAQVVKEFSRLPYAKILKHTILASR— QQVRIYEDL-SSLRHNLKVAFLMGK— GKSAIACTLAGIYQPAYILTMTKQLQDQ— GKSL DNYD - EH FCINQKVKKIK AYTS DDKWLD— —KFEQVLHEED— EYKWVKTTILGSR-—AYIQTFAQD —KLEKTMQRSD----NVIITTSTSS GKSL —NYLCIDRFKK EEIP —EQSCINPAAR KHADISQ —SNFICKSKG GKAN •YAREFGFPVVK GRSN 4r Domain I 136 136 135 129 123 139 148 69. 69 148 149 152 62 199 159 46 145 143 152 153 49 152 14 4 152 57 152 146 65 155 150 115 144 147 118 148 110 138 255 Appendices gi0096321 -AFSLPLLQNLDPE- LK APQILVLAPTRELA-VQVAEAM 111 gi0461924 -AFSLPLLNNIDPE- LR APQILVLAPTRELA-VQVAEAM 110 gill69261 -AFALPLLAQIDPS- EK H PQMLVMAPTRELA-1QVADAC 93 gil706338 -AFAIPMLSKIDIT- SK VPQALVLVPTRELA-LQVAEAF 101 gil001719 -AFALPLMDRIDPE- G DLQALILTPTRELA-QQVAEAM 94 gi2500540 -SFAIPLIELVNEN- N GIEAIILTPTRELA-IQVADEI 94 gi2648271 -AFALPILDLVDER- SK DVQAIVITPTRELA-LQVKDEI 96 gill77010 -SFGIPLCELANWD- EN KPQALILTPTRELA-VQVKEDI 90 gi2621248 -AFAIPVLENLEA— ER VPQALIICPTRELC-LQVSEEI 91 gi0266336 -TFAISILQQIELD- LK ATQALVLAPTRELA-QQIQKVV 108 gi0417180 -TFAISILQQIELD- LK ATQALVLAPTRELA-QQIQKVV 120 gi0050823 -TFAISILQQLEIE- FK ETQALVLAPTRELA-QQIQKVI 121 gil708418 -TFSISVLQKIDTS- LK QCQALILAPTRELA-QQIQKVV 107 gi0446778 -TFSIAILQQIDTS- IR ECQALILAPTRELA-TQIQRVV 118 gi2150025 -TFVIAALQKIDYS- LN ACQVLLLAPTRELA-QQIQKVA 119 gi0729821 -TFSISVLQCLDIQ- VR ETQALILAPTRELA-VQIQKGL 126 gi2443810 -TFCVSVLQCLDIQ- IR ETQALILAPTKELA-RQIQKVL 129 gi2773184 -TFSISVLQSLDTQ- VR ETQALILSPTRELA-VQIQKVV 114 gill70507 -MIALTVCQIVDTK- SS — EVQALILSPTRELA-AQTEKVI 106 gil352438 -TFSIGILQSIDLS- VR DTQALILS PTRELA-VQIQNVV 109 gi0124218 -TFSIAALQRIDTS- VK APQALMLAPTRELA-LQIQKVV 110 gi2500523 -TFTISALQRINEN- EK ATQALILAPTRELA-LQIKNVI 111 gi0729329 -AFVIPTLEKVKPK- LN —KIQALIMVPTRELA-LQTSQVV 134 gill74456 -AFVIPSLEKVDTK- KS--- KIQTLILVPTRELA-LQTSQVC 132 gil709532 -AYLIPLLERLDLK- KD NIQAMVIVPTRELA-LQVSQIC 184 gi2745894 -AYLIPLLERLDLK- ;KD NIQAMVIVPTRELA-LQVSQIC 159 gi2136088 -AYLIPLLERLDLK- KD NIQAMVIVPTRELA-LQVSQIC 173 gil709533 -AYLIPLLERLDLK- KD CIQAMVIVPTRELA-LQVSQIC 183 gi3047117 -AFCIPVLEKIDQD- NN VIQGNCVLD HTSQVC 189 gi0132530 -AFLTSTFHYLLSHP -AIADRKVN QPRALIMAPTRELA-VQIHADA 104 gi0421132 -AFLTSTFHYLLSHP -AIADRKVN QPRALIMAPTRELA-VQIHADA 104 gi0003641 -GFLFPLFTELFRSGPSPVPEKAQ — S — FYSRKGYPSALVLAPTRELA-TQIFEEA 189 gi0118411 -GFLFPVLSESFKTGPSPQPESQG —s— FYQRKAYPTAVIMAPTRELA-TQIFDEA 189 gi2500528 -AFLLPILSQIYADGPGEALRAMK -ENGRYGRRKQYPISLVLAPTRELA-VQIYEEA 189 gi3023628 -AFLLPILSQIYSDGPGEALRAMK -ENGRYGRRKQYPISLVLAPTRELA-VQIYEEA 189 gi0130256 -AFLLPILSQIYTDGPGEALRAMK -ENGKYGRRKQYPISLVLAPTRELA-VQIYEEA 189 gi2580554 -AFLLPILSQIYTDGPGEALKAVK -ENGRYGRRKQYPISLVLAPTRELA-VQIYEEA 189 gi0113825 -AFLLPILSQIYADGPGDAMKHLQ -ENGRYGRRKQFPLSLVLAPTRELA-VQIYEEA 189 gi2558533 -AFLLPVLSQIYTDGPGEALQAAKNSAQENGKYGRRKQYPISLVLAPTRELA-LQIYDEA 189 gi0129383 -SYLLPAIVHINH---QPFLERGD --G-- PICLVLAPTRELA-QQVQQVA 187 gi2500527 -SYLLPAIVHINH---HPFLERGD --G-- PICLVLAPTRELA-QQVQQVA 187 gil592565 -AYLLPAIVHINH— -QPYLERGD —G-- PICLVLAPTRELA-QQVQQVA 185 gi0133134 -GYILPAIVHINN— -QQPLQRGD — G ~ PIALVLAPTRELA-QQIQQVA 189 gi0118284 -SYCLPGIVHINA— -QPLLAPGD — G — PIVLVLAPTRELA-VQIQTEC 189 gill69228 -GYLMPAFIHLQ -QRRKNPQL — G — PTILVLSPTRELA-TQIQAEA 189 gil280208 -CFQYPPVYVG KIGLVISPLISLMEDQVLQLK 192 gi2130973 -CFQYPPVYTG KIGIVISPLISLMEDQVLQLE 192 gi2619051 -CYQIPALMFE GTTIVISPLISLMKDQVDALE 79 gi2851488 -CYQIPALLLN GLTVVVSPLISLMKDQVDQLQ 86 gil705486 -CYQLPACVSP GVTVVISPLRSLIVDQVQKLT 183 gi2276199 -CYQLPAVILP GVTVVVSPLRSLIEDQKMKMK 183 gill75484 -CYQLPAVIEGGAS- RGVTLVISPLLSLMQDQLDHLR 232 gi2128837 -IGELAGIKNLIKT- GKKFLFLVPLVALANQKYLEFK 191 gi2621738 -IAELAGIPRALG— GEKFIYLTPLVALANQKYRDFR 184 gil666893 KGAEEEKPKRRRQE- KQAACPFYNHEQMGLLRDEALA 189 gil666897 KGAEEEKPKRRRQE- KQAACPFYNHEQMGLLRDEALA 57 gi0005022 LTAGFVREQRLAG— M DVPTCEFHDNLEDLEPHSLISN 171 gi0131812 MTNGQAKRKLEEDP- EA NVELCEYHENLYNIEVEDYLPK 173 gi2495145 LTASYVRAQYQQDA- SLPHCRFYEEFDAHGRQVPLPA 172 gi2495146 LTAS YVRAQYQQDA- SLPHCRFYEEFDIHGRQMPLPA 172 256 Appendices gi0119540 LTASYVRAQYQHDT— SLPHCRFYEEFDAHGREVPLPA 172 gi2134009 LTASYIRAQRHSNP— NQPVCRFYEEFDAVGRQVPLPA 172 gi2058510 LTSSWNR ESP— TSEKCKFFENFESNGKEILLEG 167 gill69345 KSVLAELSKVRKWN— ; _ N S _ _ TKTGD-FTECIELAEDSPIIPQ 166 gi3183486 VQILSDVTLLRSWS— NQ— TVDGD-ISTCVSVAEDSQAWPL 160 gi2995310 SKLGQDLLRMRDWA— DE— AETGD-RDDLTPGVSD-RAWAQ 175 gil706438 TALGRDVQRLTAWA— ST— TVSGD-RDDLKPGVGD-RSWSQ 184 gi0281858 NQEEQKRCAKLKGD— LD— TYKWDGLRDHTDIAIDDDLWRR 107 gi0281834 NQEEQKRCAKLKGD— LD-- TYKWDGLRDHTDIAIDDDLWRR 107 gi0544162 NQEEQKRCAKLKGD— LD-- TYKWDGLRDHTDIAIDDDLWRR 18 6 gill66504 -GFMVPALAHIAVQE PLR SGD— GPMVVVLAPTRELA-QQIEEET 190 gil931649 -TYQLPALICG- GITLVISPLVSLIQDQIMNLL 183 gi2072674 -AFGVPLLQRITSG— DG— TRPLTGAPRALVVVPTRELC-LQVTDDL 104 gi0464912 -CYQLPAVVKSGKT— HGTTIVISPLISLMQDQVEHLL 234 gi0861396 -CYQLPSLLLN SMTVVVSPLISLMNDQVTTLV 190 gil710074 -CYQLPGYMLD GMVLIVSPLLSLMEDQVQQLK 77 gil363325 -AFLLPMFRHIMDQ— -RSLEEGE— GPIAVIMTPTRELA-LQITKEC 186 gi0130806 -SYLLPLLRQVKAQR-—PLSKHET- GPMGLILAPTRELA-LQIHEEV 185 gil708151 -AFGVPAISHLMND— QKKR-- GIQVLVISPTRELA-SQIYDNL 190 gil066920 -CYQLPAVVHG GITVVISPLIALMKDQISSLK 184 gi2500113 -CFQLPALLG EGLTLVVSPLVALMEDQVQSLR 80 gi0790392 -VQKLEKGTTKKTKT- CATSCEFYNSTQIEDVVNGVLS 188 gill73121 -AFLIPLVQQIIDD— KQ— TAGLKGLIISPTKELA-NQIFIEC 182 gil418571 -AFALPVIDEILELKQRADYSSSN— SSKLLAVVLEPTRELA-AQTYTEF 198 gi3036880 -VYFVATALLRRRG— AGPTVIIS PLLALMRNQVEAAA 92 gi2408082 PTSALNEKCIELQG— SAHKCPFLQDNTQLWDFRDEAL 188 gi0116351 TLEAINDACADLRHS- ; KEGCIFYQNTNEWRHCPDTLA 182 gi2642224 -LFMVGACLEG AETTILILPTVALRANMLAKLD 97 gil706437 -AVLTKAQLLVWLT— ETNTGDVAELNLPSGGKLLWDR 190 gil752904 ESNRQAHVCRGLVS— KRACHYYNKFDACTTDKMTEFL 186 gi3257105 -MIVCKSLRKLGKC— KYYENLKEKRDRV-DEIVKFF 148 gil730960 LYESLPSFVHESQA-- MQRFYPYGDRKQYANLSNEEWS 180 gi2131417 -IYQLAAIDLLLKD— p — ESTFMYIFPTKALAQDQKRAFK 183 gi2983030 EADRLEVETFMETE— WDGDLTLTGLASETIVKINVDD 154 gi3217395 YCKEVNSAHSIGCS— FKSAMKPRFEKALPLR-DHLERNG 185 gi3183240 -RLYCQLN KKCLYRPNKRPICYCGTKKQ 137 gi2622454 -FFCLNDNLESTCD— MG-- TCQTLPSSEKFQCPYGVVRGET 175 ^- Domain la -> 257 Appendices gi0096321 TDFSKHMRG VNVVALYGGQRYDVQLRALRQG-PQIVVGTPGRLLDHL 157 gi04 61924 TEFSKHMRG VNVVALYGGQRYDVQLRALRQG-PQIVVGTPGRLLDHL 156 gill69261 ELFVKYAQG TRIVTLYGGQRYDIQLRALKQG-AQVVVGTPGRILDHI 139 gil706338 GRYGAYLSQ LNVL PIYGGS S YAVQLAGLRRG-AQVVVGT PGRMI DHL 147 gil001719 KDFS-HERR LFILNVYGGQSIERQIRSLERG-VQIVVGTPGRVIDLI 139 gi2500540 ESLK-GNKN LKIAKIYGGKAIYPQIKALKN—ANIVVGTPGRILDHI 138 gi2648271 ESLR-GGKK VHVLAVYGGQPIFPQIERLRKG-VQIVVGTPGRVIDHL 141 gill77010 TNIG-RFKR IKATAVFGKSSFDKQKAELKQK-SHIVVGTPGRVLDHI 135 gi2 62124 8 KRIG-KYMK -VKVLAVYGGQSIGNQIAQLRRG-VHVIVATPGRLIDHI 136 gi0266336 MALGDYMG AS CH AC IGGTNVRAEVQKLQMEAPH11VGT PGR VFDML 154 gi0417180 MALGDYMG ASCHACIGGTNVRAEVQKLQMEAPHIIVGTPGRVFDML 166 gi0050823 LALGDYMG ATCHACIGGTNVRNEMQKLQAEAPHIVVGTPGRVFDML 167 gil708418 VALGDLMN VECHACIGGTLVRDDMAALQAG-VHVVVGTPGRVHDMI 152 gi04 4 6778 MALGEYMK VHSHACIGGTNVREDARILESG-CHVVVGTPGRVYDMI 163 gi2150025 LALGDYCE LRCHACVGGTSVRDDMNKLKSG-VHMVVGTPGRVFDML 164 gi0729821 LALGDYMN VQCHACIGGTNVGEDIRKLDYG-QHVVAGTPGRVFDMI 171 gi2443810 LALGDYMN VQCHACIGGTNVGEDIRKLDYG-QHVVAGTPGRVFDMI 174 gi2773184 LALGDYMN VQCHACIGGTNLGEDIRKLDYG-QHVVSGTPGRVFDMI 159 gill70507 LAIGDYIN VQAHACIGGKSVGEDIRKLEHG-VQVVSGTPGRVCDMI 151 gil352438 LALGDHMN VQCHACIGGTSVGNDIKKLDYG-QHVVSGTPGRVTDMI 154 gi012421.8 MALAFHMD IKVHACIGGTSFVEDAEGLRD—AQIVVGTPGRVFDNI 154 gi2500523 TAIGLYLK VT VHASIGGTSMSDDIEAFRSG-VQIVVGTPGRVLDMI 156 gi0729329 RTLGKHCG , ISCMVTTGGTNLRDDILRLNET-VHILVGTPGRVLDLA 179 gill74456 KTLGKHMN VKVMVTTGGTTLRDDIIRLNDT-VHIVVGTPGRVLDLA 177 gil709532 IQVSKHMGG AKVMATTGGTNLRDDIMRLDDT-VHVVIATPGRILDLI 230 gi2745894 IQVSKHMGG AKVMATTGPTNLRDDIMRLDDT-VHVVIATPGRILDLI 205 gi2136088 IQVSKHMGG AKVMATTGGTNLRDDIMRLDDT-VHVVIATPGRILDLI 219 gil709533 IQVSKHMGG AKVMATTGGTNLRDDIMRLDDT-VHVVIATPGRILDLI 229 gi3047117 KELGKHLK IQVMVTTGGTSLKDDIMRLYQP-VHLLVGTPGRILDLT 234 gi0132530 EPLAEATG LKLGLAYGGDGYDKQLKVLESG-VDILIGTTGRLIDYA 14 9 gi0421132 EPLAEATG LKLGLAYGGDGYDKQLKVLESG-VDILIGTTGRLIDYA 14 9 gi0003641 RKFTYRSW VRPCVVYGGAPIGNQMREVDRG-CDLLVATPGRLNDLL 234 gi0118411 KKFTYRSW VKACVVYGGSPIGNQLREIERG-CDLLVATPGRLNDLL 234 gi2500528 RKFSYRSR : VRPCVVYGGAEIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi3023628 RKFSYRSR VRPCVVYGGADIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi0130256 RKFSYRSR VRPCVVYGGADIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi2580554 RKFSYRSR VRPCVVYGGADIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi0113825 RKFAYRSR VRPCVVYGGADIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi2558533 RKFSYRSH VRPCVVYGGADIGQQIRDLERG-CHLLVATPGRLVDMM 234 gi0129383 AEYCRACR LKSTCIYGGAPKGPQIRDLERG-VEICIATPGRLIDFL 232 gi2500527 AEYCRACR LKSTCIYGGAPKGPQIRDLERG-VEICIATPGRLIDFL 232 gil592565 DDYGKCSR LKSTCIYGGAPKGPQIRDLERG-VEICIATPGRLIDFL 230 gi0133134 TEFGSSSY VRNTCVFGGAPKGGQMRDLQRG-CEIVIATPGRLIDFL 234 gi0118284 SKFGHSSR IRNTCVYGGVPKSQQIRDLSRG-SEIVIATPGRLIDML 234 gill69228 VKFGKSSR ISCTCLYGGAPKGPQLRELSRG-VDIVVATPGRLNDIL 234 gil280208 MSNIPACFL GSAQSEN VLTDIKLGK YRIVYVTPEYCSGN- 231 gi2130973 LSNVPACLL GSAQSKN ILGDVKLGK YRVIYITPEFCSGN- 231 gi2619051 EAGINAAYI --NSTQSNQEIYERLNGLKEGA YKLFYITPERLTS— 121 gi2851488 ANGVAAACL --NSTQTREQQLEVMTGCRTGQ 1RLLYIAPERLML— 128 gi 1705486 SLDIPATYL TGDKTDSEATNIYLQLSKKDPI-IKLLYVTPEKICASN 229 gi2276199 ELGIGCEAL TADLGAPAQEKIYAELGSGNPS-IKLLYVTPEKISASG 229 gill75484 KLNIPSLPL SGEQPADERRQVISFLMAKNVL-VKLLYVTPEGLASNG 278 gi2128837 ERYEKLGFK VSLRVGLGRIGKKVDV ETS-LD-ADIIVGTYEG-- 231 gi2621738 RRYSPLKLK TAIKVGMSRIRARDELRIPETD-VSKADIVVGTYEG— 228 gil666893 EVKDMEQLL A--LGKEARACPYYGSRLAIPA-AQLVVLPYQMLLHAA 233 gil666897 EVKDMEQLL A—LGKEARACPYYGSRLAIPA-AQLVVLPYQMLLHAA 101 gi0005022 GVWTLDDIT E—YGEKTTRCPYFTVRRMLPF-CNVIIYSYHYLLDPK 215 gi0131812 GVFSFEKLL K—YCEEKTLCPYFIVRRMISL-CNIIIYSYHYLLDPK 217 gi2495145 GIYNLDDLK A—LGQRQGWCPYFLARYSILH-ANVVVYSYHYLLDPK 216 gi24 9514 6 GIYNLDDLK A—LGQRQGWCPYFLARYSILH-ANWVYSYHYLLDPK 216 258 Appendices gi0119540 GIYNLDDLK A--LGRRQGWC PY FLARYSILH -ANVVVYSYHYLLDPK 216 gi2134009 GIYNLDDLK D—FGRRKGWCPYYLARYSILH -ANIVVYSYHYLLDPK 216 gi2058510 -VYSLEDLK E—YGLKHQMCPYFLSRHMLNF -ANIVIFSYQYLLDPK 210 gill69345 LTSTAESCL G-TDCPNYSECYVASARKKALN -ADLVVVNHHLFFADM 211 gi3183486 VTSTNDNCL G- S DC PMYKDC FWKARKKAMD -ADVVVVN H H L FLADM 205 gi2995310 VSVSSRECL GAS KCAYGAEC FAETARERAKL -SEVVVTNHALLAIDA 221 gil706438 VSVSARECL GVARCPFGSECFSERARGAAGL -ADVVVTNHALLAIDA 230 gi0281858 LSTDKASCL N-RNCYYYRECPFFVARREIQE -AEVVVANHALVMAAM 152 gi0281834 LSTDKASCL N-RNCYYYRECPFFVARREIQE -AEVVVANHALVMAAM 152 gi0544162 LSTDKASCL-- N-RNCYYYRECPFFVARREIQE -AEVVVANHALVMAAM 231 gill66504 KKVIPGD VYCGCVYGGAPKGPQLGLLRRG -VHILVATPGRLIDFL 234 gil931649 QANIPAASL SAGMEWAEQLKIFQELNSEHSK -YKLLYVTPEKVAKSD 229 gi2072674 ATAGKYLTAGPDTDDAAAVRRRLSVVSIYGGRPYEPQIEALRAG -ADVVVGTPGRLLDLC 163 gi0464912 NKNIKASMFS SRGTAEQRRQTFNLFINGLLDLVYISPEMISASE 278 gi0861396 SKGIDAVK LDGHSTQIEWDQVANNMHR--IRFIYMSPEMVTSQK 232 gil710074 ARGEKRAA ALNSMLNRQERQFVLEHIHR- -YKFLYLSPEALQSPY 120 gil363325 KKFSKTLG LRVVCVYGGTGISEQIAELKRG -AEIIVCTPGRMIDML 231 gi0130806 TKFTEADTS IRSVCCTGGSEMKKQITDLKRG -TEIVVATPGRFIDIL 231 gil708151 IVLTDKVG MQCCCVYGGVPKDEQRIQLKKS —QVVVATPGRLLDLL 234 gil066920 RKGIPCETL NSTLTTVERSRIMGELAKEKPT -1RML YLT AEGVA