UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Concerted evolution of a cluster of X-linked tRNA4 7 genes from Drosophila melanogaster Leung, Jeffrey 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1988_A1 L48.pdf [ 18.94MB ]
Metadata
JSON: 831-1.0098070.json
JSON-LD: 831-1.0098070-ld.json
RDF/XML (Pretty): 831-1.0098070-rdf.xml
RDF/JSON: 831-1.0098070-rdf.json
Turtle: 831-1.0098070-turtle.txt
N-Triples: 831-1.0098070-rdf-ntriples.txt
Original Record: 831-1.0098070-source.json
Full Text
831-1.0098070-fulltext.txt
Citation
831-1.0098070.ris

Full Text

CONCERTED EVOLUTION OF A CLUSTER OF X-LINKED tRNA4, 7 S e r GENES FROM Drosophila melanogaster by JEFFREY LEUNG B.Sc, University of British Columbia, 1980 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES GENETICS PROGRAM DEPARTMENT OF ZOOLOGY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA January, 1988 0 Jeffrey Leung, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Zoology The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date h\, (MX  ABSTRACT ii Multigene families have posed an acute problem for evolutionary biologists ever since the revelation that many families exhibit unexpected sequence homogeneity within and between individuals of a species. A family that is shared between several species, in contrast, often reveals substantial heterogeneity between them. This cohesive and species-specific pattern of variation, which disengages from the classical mode of random genetic drift and selection, has been formally described as Molecular Drive (Dover, 1982). Based on initial observations (Cribbs 1982), the tRNA4Ser and tRUA^t genes on the X-chromosome of Drosophila melaaogaster also showed intriguing characteristics reminiscent of Molecular Drive. However, in this unusual case, the coevolution process would not only encompass the individuals within a family, but would also ensnare members from a different family. This thesis is an in depth study on the concerted evolution of both gene families and provides evidence consistent with the view that they are undergoing Molecular Drive. Eight tRNA4,7^er genes have been cloned from bands 12DE on the X-chromosome of D metanogaster by molecular walking. There are two tRNA4Ser and two tRNA7^er genes that contain sequences expected from their known tRNAs (Cribbs et. al., 1987a). Of the 86 nucleotides, they only differ from each other at positions 16, 34 and 77 (non-standard numbering, see Sprinzl et al., 1987). The difference at position 34 corresponds to the anticodon and accounts for their difference in codon recognition. These genes have been designated as either 444 or 777 genes, based solely on the three diagnostic differences. However, there is also a single 474 and two 774 genes, which are recombinant structures of the bona fide genes. The remaining gene, 444*, has the three nucleotides diagnostic of tRNA4Ser but contains a mutation at the tip of the extra arm. Thus collectively, the entire caste of tRNA4,7Ser genes at 12DE forms a graded series of transitional states, bridging the narrow sequence variability between true tRNA4Ser and tRNA7Ser. Flanking sequences of these hybrid and the 444* genes show segmental homologies related to both the 444 and 777 genes within the cluster, again a strong indication that both gene i i i types are undergoing concerted evolution. Examination of selected genes from two distantly related sibling species, D, erecta and D. yakuba, shows their equivalent flanking sequences have diverged from those of melanogaster. As expected, the base changes in these species, often occurring as clusters, are also non-random and appear to have been propagated to certain respective members to maintain a species-specific and cohesive pattern of variation consistent with Molecular Drive. One possible mode of spreading sequence variation and creating the hybrid genes in the process could involve an initial stage of asymmetric pairing between 4 4 4 and 777 DNA. To examine this possibility, a tRNAAF8 gene cluster also from 12DE was conveniently exploited as independent "monitors". This family shows fluctuations in the number of genes among the different species and strains (Newton, unpublished), which could also be explained by asymmetric pairing of DNA followed by unequal exchange. Thus, even though the tRNAAF8 and tRNA4,7Ser genes have embarked on different evolutionary pathways, both phenomena may be explained by their common susceptibility to local asymmetric pairing of DNA. iv T A B L E OF CONTENTS Page Abstract ii Table of Contents iv List of Tables xii List of Figures xiii Abbreviations xvii Acknowledgments xx Dedication xxi Introduction 1 Modified nucleotides in tRNAs 2 The Universal Cloverleaf 3 The tRNA Tertiary Structure 4 Structures of Eukaryotic tRNA Genes 9 Intron-Containing tRNA Genes 10 tRNA Variant Genes and Pseudogenes 11 Number, Diversity and Organization of tRNA Genes 14 Transcription of tRNA Genes 19 Flanking Modulatory Sequences 21 Formation of Transcriptional Complexes 27 Maturation of tRNA Transcripts 30 Other Unusual tRNA-mediated Cellular Functions in Eukaryotes 35 1. Protein Degradation 35 2. Primers for Reverse Transcription of Retrotransposons 36 3 • Chlorophyll Biosynthesis 37 4. Induced and Naturally Occurring Suppressor tRNAs 38 The Present Studies 40 Methods and Materials 48 V REAGENTS 48 Enzymes Used in Molecular Cloning 48 Oligonucleotides 48 Nucleotides 50 Phenol 50 Formamide 50 Acrylamide 51 Agarose 51 Galactosides 51 Autoradiography 51 Supplies for Culture Media 51 BACTERIAL STRAINS 51 CULTURE MEDIA AND CONDITION 52 Ecoli 52 fruitflies 54 MASS COLLECTION OF EMBRYOS FROM />. meknogaster 54 PLASMIDS AND BACTERIOPHAGE VECTORS 55 INTRODUCTION OF PLASMID AND DOUBLE-STRANDED BACTERIOPHAGE Ml 3 DNA INTO £ Coli 56 Reagents 56 BACTERIAL TRANSFORMATION 56 ISOLATION OF PLASMID AND DOUBLE-STRANDED Ml3 DNA 58 LARGE-SCALE DNA ISOLATION 58 Plasmid DNA 58 Double-Stranded Ml 3 58 Lysis by Triton X-100 59 Lysis by Alkali - Large-Scale DNA Preparation 59 CsCl Gradient Purification of DNA 60 vi Purification of DNA by Column Chromatography 60 Small-Scale Mini-Preparation 61 ISOLATION OF TEMPLATE DNA FOR SEQUENCING 61 Double-Stranded DNA 61 Single-Stranded DNA 62 Bacteriophage Ml3 62 The pEMBL Plasmids 63 Preparation of the Helper Phage IR1 63 DNA SEQUENCING 64 Chain-Terminator Method 65 Single-Stranded DNA Templates - Ml3 and pEMBL Plasmids 65 Double-Stranded DNA Templates - pUC13 and Double-Stranded pEMBL Plasmids 65 Chain-Te rminatio n Reactio ns 65 Single-Stranded Templates 65 Double-Stranded Templates 66 Purification of Radiolabeled Restriction Fragments For Maxam-Gilbert Sequencing 66 TREATMENT OF GLASSWARE AND PLASTICWARE 68 ISOLATION OF GENOMIC DNA FROM Drosophila 68 Quick Method 68 Large-Scale Method I 70 Large-Scale Method II 71 PARTIAL DIGESTION OF GENOMIC DNA FOR LIBRARY CONSTRUCTION 72 D. melanogaster Genomic DNA 72 D. erecta and D. yakuba Genomic DNAs 72 SIZE FRACTIONATION OF D. melaaogaster DNA 73 NaCl Linear Gradient 73 vii Gel Fractionation 73 CONSTRUCTION OF GENOMIC LIBRARIES 76 Choice of Cloning Vectors 76 Bacteriophage Lambda 79 Large-Scale Lambda Preparation 79 CsCl Gradient Purification of Live Lambda Bacteriophage 80 Preparation of Lambda Vector Arms 80 Preparation of Cosmid DNA 81 Preparation of Cosmid Vector Arms 81 Ligation of Lambda Vector Arms to Drosophila DNA 84 D. melanogaster Libraries 84 Drosophiht Sibling Species Libraries 84 Ligation of Cosmid Arms to D. melanogaster DNA 85 cosPneo Vector 85 pJB8 Vector 85 IN VITRO PACKAGING OF BACTERIOPHAGE AND COSMID DNA 85 Freeze-Thaw Two-Strain Packaging Extracts 86 In vitro Packaging Using the Two Strain System 87 cos' Packaging Extracts 87 In vitro Packaging Using cos' Extracts 87 AMPLIFICATION OF GENOMIC LIBRARIES 88 PREPARATION OF RADIOLABELLED PROBES 89 Nick-Translation 89 Oligonucleotide Probes 89 Construction of tRNA4,7Ser- and tRNAAr«- Specific Probes by Strand Synthesis 90 EMPIRICAL EVALUATION OF GENOMIC LIBRARIES BY SOUTHERN HYBRIDIZATION 91 viii SCREENING GENOMIC LIBRARIES 92 Plating Bacteriophage 31 Libraries 92 Plating Cosmid Libraries 92 Lysis of Membrane Bound Bacteriophages or Bacterial Colonies 93 Prehybridization 93 Hybridization 93 Isolation and Purification of JL Clones 94 Isolation and Purification of Cosmid Clones 94 RESTRICTION ENDONUCLEASE DIGESTS 95 GEL-ELECTROPHORESIS 97 Agarose Gels 97 Acrylamide Gels 97 RECOVERY OF RESTRICTION FRAGMENTS FROM GELS 99 Agarose Gels 99 Acrylamide Gels 100 SOUTHERN TRANSFER 100 RESTRICTION MAPPING .'. 101 Low Resolution Restriction endonuclease Mapping 101 Restriction Endonuclease Mapping by Partial Digestion 101 A Novel Restriction Endonuclease Mapping Method by Indirect Labelling with Sequencing Oligonucleotide Primers 102 MOLECULAR CLONING IN PLASMID AND DOUBLE-STRANDED M13 BACTERIOPHAGE VECTORS 103 Restriction Endonuclease Digestion of Vector DNA 103 Dephosphorylation of Vector DNA 103 DNA ligation 106 PREPARATION OF i 3'-32PltRNA 106 Synthesis of Cytidine 3', 5'-diphosphate 106 ix RNA Ligase-Catalyzed Addition of 15 , J 2P] -pCp 107 DNA DOT BLOTS 107 ORIENTATION OF tRNA4 i7S er GENE TRANSCRIPTION 107 CHAPTER I 110 Characterization of the Entire tRNASer Gene Cluster at Polytene Bands 12DE by Chromosomal Walking 110 RESULTS 112 1 (A). A Chromosomal Walk in the pDt73 Region 112 (B). Interspersed and Tandemly Repeated Elements 116 2 (A). Chromosomal Walk in the pDtl7R Region 122 (B) . Localization of tRNA^ Genes Within the Walk 125 (C) . Sequence Analyses of pE4.6 and pE1.8 131 3 (A). Chromosomal Walk in the pDt27R Region 132 (B). Localization of tRNA Genes Within the Walk 135 4 (A). Chromosomal Walk in the pDtl6R Region 135 CHAPTER II 145 Flanking Sequence Relatedness in the tRNA4,7 S e r Genes in D. melanogasier And Sibling Species 145 RESULTS 147 Part I- Homologies in the 5'-Flanking Regions of the melanogaster Hybrid Genes: Wedded Patchworks 147 (A) . The 474 Gene is Most Closely Related to the 444-1 Gene 147 (B) . The 774 Gene in pDtl6R is Most Closely Related to pDtl7R-777 149 (C) . The 774 Gene in pDtl7R is Possibly Related to pDtl6R-777 149 (D) . The 444* Gene Has a Patchwork 5-Flanking Region Characteristic of pDtl7R-777 and 444-1 Genes 149 Sequence Homologies in 3'-Flanking Regions 150 Part II- Examination of Loci Homologous to pCS474 and pDtl6R in Drosophila X Sibling Species 152 (A) . Detection of Homologous DNA By Genomic Southern Hybridization 152 (B) . Isolation of pCS474 Homologous Fragment from D. erecta 157 (C) . Isolation of pDtl6R Homologous Fragment from D. erecta 160 (D) . Isolation and Sequencing of the D. erecta tRNA561* Genes Homologous to pDt27R 166 (£). Isolation of pDtl6R Homologous DNA Segment From D yakuba 171 (F). Analysis of pDt27R Homologous Region in XDY16-82 171 Part III- Rates of Flanking Sequence Divergence in Homologous tRNA581" Genes From Different Drosophila Species 180 CHAPTER III 189 tRNAAf8 Genes at 12DE 189 RESULTS 191 tRNA A r « Genes in ADE16 From D erecta 191 tRNAA r« Genes in XDY16-82 From D. yakuba 194 DISCUSSION ; 198 1. The Overall Molecular Organization of 12DE 198 2. Co-evolution of the tRNA4,7Ser Genes 206 3. A Model Postulating the Origin of Type II Homology Patches 211 4. Possible Mechanisms Involved in Generating the Hybrid Genes 218 5. tRNAAr« Genes From Drosophila Sibling Species 228 APPENDIX CHAPTER IV 235 tRNA3bVal Genes and Related Sequences 235 RESULTS 236 Sequence Analysis of pDt4lR 236 xi Homologies With Another tRNA3bVaLContaining Plasmid 236 DISCUSSION 245 CHAPTER V 249 Dosage Compensation 249 Regulatory Genes 250 fly-Acting Regulatory Sequences 251 RESULTS 257 DISCUSSION 262 REFERENCES .' 265 MST QF TABLES Page Table I. List of Oligonucleotides 49 Table II. Deoxy-dideoxyribonucleoside Triphosphate Mixes For Chain-Termination Sequencing 67 Table III. DNA Sequencing Reactions by the Maxam-Gilbert Method 69 Table IV. Specific Buffers For Restriction Enzymes Used In Library Construction % Table V. A Summary of tRNA Genes Identified in Bands 12DE 199 Table VI. A Summary of Identified Drosophila melanogaster Variant Genes 247 L IST OF FIGURES Page Figure 1. Generalized two-dimensional representation of a tRNA molecule 5 Figure 2. Tertiary interactions in yeast phenylalanine tRNA 7 Figure 3. Two typical chromosomal sites enriched for tRNA genes 17 Figure 4. 5'-Flanking sequences of different tRNA genes from S. cerevislae 25 Figure 5. Splicing of tRNA in yeast 33 Figure 6. Sequence of tRNA?561" 42 Figure 7. Fractionation of Mbol partial digest of Oregon-R DNA by NaCl gradient 74 Figure 8. Restriction maps of cosPneo and XEMBL3 77 Figure 9. Systematic testing of intactness of the restriction ends in both vector and genomic DNAs before packaging 82 Figure 10. Restriction mapping by oligonucleotide indirect labelling method 103 Figure 11. Transcription orientation of tRNA genes 109 Figure 12. Molecular walk in the pDt73 domain 113 Figure 13. DNA sequence of tRNA561" 474 gene from Canton S is identical to its homologue in Oregon-R 115 Figure 14. Interspersed repeated sequences shared between pDt73 and pDtl7R domains 118 Figure 15 Hybridization of a 1 3 kb SstI fragment corresponding to one repeat unit of the Stellate sequences to fly strains deficient for polytene bands 12DE 120 Figure 16. Chromosomal walk in the pDtl7R domain 123 Figure 17. Restriction mapping of pE4.6 by the oligonucleotide indirect labelling method 126 Figure 18. Localization of the tRNASer gene in pE1.8 by the oligonucleotide indirect labelling method 129 Figure 19. Nucleotide sequence of the 444* gene in pE4.6 from the Oregon-R strain 133 Figure 20. Nucleotide sequence of 774 gene in pEl .8 from Oregon-R 134 Figure 21. Chromosomal walk in the pDt27R domain 136 Figure 22. Sequence of pArgl2.6 of D. melanogaster 138 Figure 23. Chromosomal walk in the pDtl6R domain 140 Figure 24. DNA sequence of the pCSl6-777 gene from Canton S 142 Figure 25. DNA sequence of pCS16-774 gene from Canton S 143 Figure 26. 5'-Flanking homologies in tRNA4/7Ser genes in D. melanogaster 148 Figure 27. 3-Flanking homologies of non-allelic genes of D. melanogaster 151 Figure 28. Evolutionary relationship among the eight species of Drosophila species subgroup based upon their polytene chromosome banding patterns 153 Figure 29. Genomic Southern blot of Drosophila sibling species subgroups with probe pCS474 155 Figure 30. Genomic Southern blot of Drosophila sibling species subgroup with probe pDtl6R 158 Figure 31. Subclone of the 8.5 kbBjtmHI fragment from fcDE73 from D. erecta 161 Figure 32. Nucleotide sequence of the 474 gene from D. erecta 162 Figure 33 Restriction map of X.DE 16 163 Figure 34. Nucleotide sequence of the 774 gene from D. erecta 164 Figure 35 Nucleotide sequence of the 777 gene from D erecta 165 Figure 36. Sequence of D. erecta 444-1 gene 167 Figure 37. tRNA4Ser gene of D erecta homologous to pDt27R 444-2 gene of D. melanogaster 168 Figure 38. Localization of the 444-1 and 444-2 genes in the 1.6 kb BamHI fragment of ADE16 by oligonucleotide indirect labelling method 169 Figure 39. Restriction map of pDtl6R/pDt27R homologous region from D. yakuba 172 Figure 40. Localization of the 777 gene in JLDY16-82 by the oligonucleotide indirect XV labelling method 173 Figure 41. Sequence of the 777 gene in D. yakuba 175 Figure 42. Localization of the 774 gene in XDY16-82 by oligonucleotide indirect labelling method 176 Figure 43. Nucleotide sequence of the 774 gene from D. yakuba 178 Figure 44. Sequence of the 444-2 gene in D. yakuba that is homologous to the D. melanogaster pDt27R 444-2 179 Figure 45. Comparison of 5'-flanking sequences among homologous genes from different Drosophila species 181 Figure 46. Comparison of 3'-flanking sequences among the homologous bona fide genes from the different Drosophila species 183 Figure 47. Evidence for concerted evolution of tRNA4,7Ser genes 185 Figure 48. Sequence of pDeArg-1 from D. erecta 192 Figure 49. Sequence of 3'-end of pDeArg-6 from D erecta 193 Figure 50. Nucleotide sequence of pDyArg-1 from D. yakuba 195 Figure 51. tRNA4,7Ser and tRNAAr* genes at 12DE 201 Figure 52. The current progress in the assignment of the X-linked tRNA genes to polytene bands 204 Figure 53- Genealogy delineating the formation of the tRNASer genes at 12DE 212 Figure 54. Schematic stepwise diagrams showing the possible lineages of hybrid tRNA4,7Ser genes encountered at 12DE based on their shared flanking homologies 214 Figure 55- The four tRNAAf8 genes and flanking sequences between the direct repeats from the Drosophila sibling species are summarized 230 Figure 56. The sequenced segments of pDt41R, pDt48 and the corresponding region from Canton-S from region 1 of chromosomal site 90BC are shown 237 Figure 57. The cloverleaf structure of Drosophila tRNA3t>Val with sites of the KV1 four differences indicated for a hypothetical product from the variant gene ...239 Figure 58. Restriction maps of pDt48 and pDt41R as constructed by the Smith and Birnstiel method 241 Figure 59. Homologies among all possible flinfl fragments in pDt48 and pDt41R as deduced by Southern-cross hybridization 243 Figure 60. A summary of repeated sequences from 12DE in the cloned 157 kb 255 Figure 61. Measuring copy number of the repeats in pDt73 258 Figure 62. Molecular cloning of the white locus 260 LIST OP ABBREVIATIONS xvii A Adenosine ATP Adenosine 5 -triphosphate A 2 6 O absorbance at 260 nm A 2 6 O units the amount of material giving an absorbance of 1.0 in 1.0 mi of solution in a 1 cm light path at 260 nm at neutral pH. bp base pairs BSA bovine serum albumin C cytosine CH buffer 40 mM Tris, pH 8.0,1 mM spermidine, 1 mM putrescine, 0.1% (J-mercaptoethanol, 7% DMSO CIP calf intestinal phosphatase Cm 2-0-methylcytidine I) dihydrouridine ddNTP 2',3-dideoxyribonucleoside triphosphate (nucleosides may be specified as G, A, T. or C) DEAE diethylaminoethyl dNTP 2'-deoxyribonucleoside triphosphate (nucleosides are specified as CA.T.andC) DMS dimethylsulfate DMSO dimethyl sulfoxide DNA deoxyribonucleic acid DNase deoxyribonuclease DTT 1,4-dithiothreitol EDTA ethylenediaminetetraacetate G guanosine HEPES N-2-hydroxyethylpiperazine-N'-2-ethanesulfonic acid xviii HZ hydrazine i*>A N^-isopentenyladenosine IPTG isopropyl 8-D-thiogalactoside kb kilobase pairs LB Luria-Bertani m7G 7-methylguanosine M+G Maxam and Gilbert P2O5 phosphorus pentoxide pCp cytidine 3,5'-diphosphate PEG polyethylene glycol PFU plaque forming units pol RNA polymerase Q 7- (4,5- cis-dihydroxy-1 - cy clopenten- 3-y laminomethyl )-7-deazaguanosine R a purine nucleoside RNase ribonuclease rpm revolutions per minute rRNA ribosomal RNA RT room temperature S Svedberg, sedimentation unit SDS sodium dodecyl sulfate snRNAs small nuclear ribonucleic acids SSC standard saline citrate (0.15 M NaCl, 0.015 M sodium citrate) T thymidine t6A N-[(9-B-D-ribofuranosylpruin-6-yl)carbamoyl]threonine TBE Tris:borate£DTA electrophoresis buffer Tris Tris( hydroxy methyl )aminomethane TE 10 mM Tris-Cl, pH 8.0,1 mM EDTA 200 mM Tris-Cl. pH S.0,200 mM NaCl, 1 mM EDTA N,N,N' ,N-tetramethylethylenediamine uridine ultraviolet 5-bromo-4-chloro-3-indolyl-B-D-galactoside a pyrimidine nucleoside a-(carboxyamino)-4,9-dihydro-4,6-dimethyl-9-oxo-lH-imidazo[l,2-a]-purine-7-butyric acid dimethyl ester ACKNOWLEGEMENT xx I would like to thank, with deep sincerity, Dr. Gordon Tener for his inexhaustible patience, energy, encouragement and financial support throughout the entirety of this project. Sine quibus non .... in the most literal sense; to you with gratitude. I also thank him for his humorous (unintended, I think!) and philosophical discourse on "why rotors have lids" and his generosity in paying for the damages after I left my mark in history; to Dr. Sinclair for generously contributing ideas and help in many phases of this project. The benefits derived from working with such an intellectually tenacious individual could never be overstated; to Dr. Hay ash i for her excellent in situ hybridization experiments in clarifying many gene mapping studies; to Dr. Ian Gillam who unwittingly introduced me to the beauty of Ambidopsis I would also like to thank Dr. Tom Grigliatti for reading and correcting this thesis, which definitely made it more compact and digestible; but most of all, free reign into my intellectual twilight zones. Lastly, I thank the individual who had the foresight to stock up on the lab supply of Tylenol. For my favorite people -Mom and Dad, who suffered hard and brutally long through it all; this thesis truly rings of hollow consolation. Kathleen, never the one who is bound by longings to which the fruit is sorrow, whose courage to move on is inspiring and whose warm friendship will forever lodge in the dells of my memory. INTRODUCTION 1 Transfer RNA [tRNA] is a small species of RNA in the cell. It sediments at about 4S, and is a mixture of molecules some 74-92 nucleotides in length. This is the "adaptor1 species first proposed by Crick (1966) as the agents responsible for fitting amino acids to their correct nucleotide triplets on the messenger during protein synthesis. Each transfer RNA thus has the dual properties of specifically recognizing both its particular amino acid and the codon representing it. The first function is achieved by its interaction with the amino acid activating enzymes; the second is mediated by the ribosome, which enables a triplet in messenger RNA to be recognized by a complementary trinucleotide sequence of bases in the tRNA, the anti codon. By now, the sequences of several hundred tRNAs from a variety of different oraganisms have been determined (Sprinzl et. al., 1987). In all cytoplasmic tRNAs, despite the variation in their exact nucleotide sequence of residues, certain positions are well-conserved from one tRNA to another such as positions Us, A H , Gls, G19, A21,1133,653, T54, '¥55, C56, A58, C61, C74, C75 and A7& (7 is pseudouridine). Some other positions are almost always occupied by a pyrimidine (Y) and yet other sites by purines (R): Y\it RJ5JR24, Y32, R37, Y48, R57 and Y60 (Sprinzl et. al 1987). Also, at the 3' end of each mature tRNA always terminates in cytidylic acid, cytidylic acid, adenylic acid or CCA-OH The other end, the 5' end, always carries a 5'-terminal phosphate group and is often guanylic acid. The patterns described above are generalizations distilled from many tRNA sequences and any particular tRNAs may show critical differences from the general rule. One class of tRNAs, the initiator methionine tRNAs, are distinctly different from the rest. Prokaryotic tRNAfMet lacks a base-pair at the 5' end of the acceptor stem and has an Al 1 U24 base-pair in the D-stem rather than the usual Yn R24 pair. In eukaryotic tRNAi^et l^yy is replaced by A54U55 and Y60 is replaced by A. In tRNAjMet of higher eukaryotes the normally invariant U33 adjacent to the anticodon is replaced by a C residue (Sprinzl et al., 1987; Addison, 1982). The deviations of the initiator methionine tRNAs from the "standard" z form may reflect their special status in proteins synthesis as they are also recognized by a different set of translation factors (EF-Tu or eEFl) for binding to the ribosome (reviewed by Lewin, 1987). Modified Nucleotides in tRNAs A further striking aspect of all tRNA molecules is their high content of unusual bases (other than G, A, U, or C). Over 50 different modified bases have been isolated from tRNA. Many of these unusual bases differ from normal bases by enzymatic modifications of preexisting bases or the ribose moieties such as addition of methyl (CH3) groups, or from replacement of the oxygen atoms in the bases by sulfur (Kim, 1978; reviewed by Nishimura, 1978). Although the function of most of the unusual bases is not yet understood, most of them occur only at one or a few characteristic positions in the tRNA structure. Often the same modified nucleotide or its derivatives occupy the same site in homologous tRNAs from a wide variety of organisms. These obsevations suggest that the modified bases of tRNA play some important roles in structure and function. Modified nucleotides in the first, or "wobble" position (nucleotide 34) of the anticodon are directly involved in the codon-anticodon interaction. Unmodified A or U residues are almost never found in the "wobble" position; adenosine is usually modified to inosine (I). In ribosome binding assays, I in this position is capable of pairing (or "wobbling") with A, C or U in the third position of the codon. Uridine in the "wobble" position is often modified to 2-thiouridine or its derivatives. These nucleotides will pair only with A in the third position of the codon. In £ a*//tRNA a U in the "wobble" position is sometimes modified to uridine-5-osyacetic acid permitting pairing with A, G or U in the codon. The hypermodified base Q (derived from G) or its glycosylated derivatives are found in the first position of the anticodon of some tRNAs. This will base pair with either U or C but has a greater affintity forU. The third position of the anticondon (nucleotide 36) pairs with the first position (5'-end) of the codon during translation. This interaction must be highly specific to avoid 3 accumulation of error during protein synthesis. If a tRNA has an A residue in the third position of the anticodon the A is almost invariably flanked on the 3' side by a hydrophobic base such as N_6-isopentenyladenine, Y base, or its derivatives. If a tRNA has a U residue in the third position of the anticodon the hydrophilic nucleoside t6A or its derivatives are found immediately 3' to the anticodon. The hypermodified bases could function to stabilize the A-U base pairing between the first position of the codon and the third position of the anticodon. G or C residues at the third position of the anticodon are flanked by simple methylated purines or by unmethylated A. Other modified bases, such as m?G, are found only in tRNAs specific for certain amino acids. Yet others are found only in the tRNAs of some organisms. For instance, 4-thiouridine is restricted to prokaryotic tRNAs while 5-methylcytosine is found only in that of eukaryotes. Any specific functions associated with these modified nucleotides remain speculative at this point. The Universal Cloverleaf Usually, each amino acid is represented by more than one tRNA. These multiple tRNAs charged with the same amino acid are called isoaccepting tRNAs. A group of isoaccepting tRNAs is thought to be charged only by a single aminoacyl-tRNA synthetase specific for their amino acid; thus, these isoacceptors must share some common feature(s) enabling the enzyme to distinguish them from the other groups of isoaccepting tRNAs. The entire complement of tRNAs is divided into 20 isoaccepting groups; each group is able to identify itself to its particular or cognate synthetase. The common or distinctive features that characterize one group of isoacceptors from another is not known. Early attempts to correlate these identity features with the primary sequences of tRNAs were met with failures. Even though the sequences of the tRNAs are variable, they nonetheless conform to the same general secondary structure. Each tRNA sequence can be written in the form of a cloverleaf, maintained by base pairing between short complementary regions (fig. 1). There are four major arms, named for their 1 structure or function. The acceptor arm consists of a base-paired stem that ends in an unpaired sequence whose free 2- or 3-OH group is aminoacylated. The other arms consist of base-paired stems and unpaired loops. The "TfC arm" is named for the presence of this triplet sequence; the "anticodon arm" always contains the anticodon triplet in the center of the loop, and the "D arm" is named for its content of the base dihydrouridine. The most variable feature of tRNA is the so-called "extra or variable arm", which lies between the anticodon and the TfC arms. Depending on the length of the extra arm, tRNAs can be divided into two classes. "Class 1 tRNAs" have a small extra arm, consisting of only 3-5 bases and represent about 75% of all tRNAs. "Class 2 tRNAs" (tRNAsLeu, tRNAss*r a n d the prokaryotic tRNAstyr) have a large extra arm with 13-21 bases, and about 5 base pairs in the stem. In fact, in this class of tRNAs, it could be the longest arm in the entire tRNA molecule. The base pairing that maintains the secondary structure is virtuallly invariant. There are always 7 base pairs in the acceptor stem, 5 in the T*C arm, 5 in the anticodon arm, and usually 3 (sometimes 4) in the D arm. Within a given tRNA, most of the base pairings will be conventional partnerships of A-U and G-C, but occasional G-U, G-¥, or A-T pairs are found. The additional types of base pairs are less stable than the conventional pairs, but still allow a double-helical structure to form in RNA. The tRNA Tertiary Structure Even though the cloverleaf structure is often conveniently used to illustrate the conformation of the tRNA, X-ray crystallographic studies showed that it can fold into a higher order structure by additional H-bonding between regions that are unpaired in the cloverleaf structure. The crystal structure of yeast tRNA^e was first published at 2.5 A resolution in 1975 (reviewed by Kim, 1978; Rich and RajBhandary, 1976). It is a flat, L-shaped molecule that is about 20-25 A thick (Rich and RajBhandary, 1976). The amino acid acceptor CCA group is localized at one end of the L extending out into the solvent, some 70 A from the anticodon, which occurs at the opposite end. The dihydrouracil-rich (D), the ID arm 17 Pu O G* !3l2Py 10 2223PU 25 20 0 0 A C 73 1 72 2 — 71 3 — 70 4 — 69 5 — 68 6 — 67 7 — 66 U* 9 26 27 — -43 28 — -42 29 — •41 30 40 31 — -39 Acceptor stem Tfc arm py 59 65 64 63 62 C 49 50 51526 Py T U O ° o 44 O 45 O 46 0 Q O A* Pu C 47 O Anticodon PY* arm U 34 35 38 Pu* 36 O o ° Extra arm Anticodon F i g . 1 . General ized two-dimensional representation of a tRNA molecule. By convention, the tRNA is wr i t ten in the form of a cloverleaf s t ructure . The var ious arms of the molecule are indicated in the diagram. Some of the more conserved posit ions are indicated as the actual base, rather than a number. For semi invariant bases, Py and P u are used to indicate the presence of either p y r i m i d i n e or pur ine . A n aster isk indicates that the base is modi f ied , but that the form of the modif icat ion may vary. C i rc les s ign i f y h igh ly var iab le posit ions where ex t ra bases are often found (for example at posit ions 17 and 20, and at the var iable arm), Base pa i r ing wi th in the stem structures are shown as l ines, (modified from L e v i n , 1987, and K i m et. a l . , 1974.1 6 variable and the TfC loops stack to form the corner of the L. The double stranded region of the tRNA P n e approximate an RNA A helix. In this helix the base-pairs are tilted with respect to the helix axis and do not intersect with the axis. This results in a 6 A hole running through the center of the helix. The A helix has a very deep major groove and a very shallow minor groove (fig. 2). A major contribution to the stability of tRNA structure is made by the extensive base-stacking present in the molecule. All but five bases in tRNA^be a r e involved in the stacking interactions (Holbrook et. al., 1978) forming the two columns or arms of the L-shaped structure. Many tertiary hydrogen-bonding interactions involve bases that are invariant in all known tRNAs, strongly supporting the belief that all tRNAs have basically the same tertiary configurations. Hydrogen bonds form an intricate network holding the two arms of the tRNA in the correct orientation to one another. All the base-pairs in the major groove of the D-stem are involved in tertiary hydrogen bonds with the variable loop. The conserved GG doublet in the D-loop is bonded to the T*C sequence in the T-loop. Uridine 8 and A9 , located between the acceptor stem and the D stem, are hydrogen bonded to A14 and A23 respectively. With the exception of the G19-C56 bond, none of the tertiary hydrogen bonds involve conventional AU and GC pairs. Other tertiary interactions stabilizing sharp bends in the molecule involve groups of the ribose-phosphate backbone, including the 2 -OH of the ribose sugars. Only a few tertiary hydrogen bonds hold the anticodon stem to the remainder of the molecule, raising the possibility that the relative orientation of the anticodon region may change during protein synthesis. Since the structure of yeast tRNA^n e was elucidated, the tertiary structures of several other tRNAs have been determined at varying degrees of resolution (.£. coli tRNAfMet yf00 et al., 1980; yeast tRNA Asp, Moras et al., 1980; yeast tRNAiM e t, Shevitz et. al., 1980). It is comforting to know that the structure of yeast tRNA^n e appears to be typical of at least those tRNAs with a small variable loop. The structure of tRNA in solution is very similar to its structure in a crystal lattice. Burgeoning evidence gathered by a wide variety of techniques including oligonucleotide binding, tritium exchange, base-specific chemical 7 F i g . 2. Tertiary interactions in yeast phenylalanine tRNA. (A). The molecule is drawn in the conventional cloverleaf structure with solid lines connecting bases that are additionally hydrogen-bonded. (B). Sequence rearranged to show continuous stacking of the anticodon stem on the D stem and of the acceptor stem of the TfC stem. Note the close interaction between the TfC and D loops. (C). Diagram illustrating the folding of the yeast tRNA^ne molecule. The ribose-phosphate backbone is drawn as a continuous ribbon, and internal hydrogen bonding is indicated by crossbars. Position of unpaired bases are indicated by rods that are shortened intentionally. The shaded areas represent the two ends of the folded molecule, the anticodon and the amino acceptor stem. The numbering of the nucleotides are identical to the above. (After Kim et. a/., (1974).! 8 (0 9 modification and NMR spectroscopy tend to support this conclusion (reviewed by Kim, 1978). Structure of Eukaryotic tRNA Genes A DNA sequence containing all the information necessary to code for a complete tRNA structure is generally defined as a tRNA gene. However, this is merely an operational definition since the extent of regulatory elements flanking the structural sequence necessary for proper transcription are as yet ill-defined. An inventory of genes from a variety of organisms have now been cloned and sequenced (Sprinzl et. al., 1987), which represents most of the 61 possible "sense" codons of the genetic code. In contrast to prokaryotes, the 3' terminal CCA end of the mature tRNA is not encoded in eukaryotic genes but must be added post-transcriptionally by the enzyme nucleotidyltransferase (Deutscher, 1982). Almost all transcriptional units are monomeric and appears not to be dependent on the coherent organization of an operon. An exception of this generalization has been observed in yeast, where the tRNA^er genes exist in dimeric forms with the tRNA e^t genes (Mao et. al., 1980). To date, there is no specialized sequence in the '^-flanking region that is conserved for all tRNA genes of an organism. In fact, the general observation is that even within the set of genes for a single tRNA isoacceptor, little 5'-flanking sequence homology can be found. Similarly, the 3-flanking sequences of tRNA genes are also highly variable with the exception of their high A+T nucleotide content. The best conservation is however, associated with a stretch of T nucleotides that function as a polymerase III termination signal (Silverman et. al., 1979; Valenzuela et. al., 1977). The single D. melanogaster species is similar to all other tRNAHis species that have been sequenced in that the 5' stem of the acceptor arm is one nucleotide longer than all other tRNA species. The 5' terminal nucleotide is an unpaired guanylate residue and is not encoded by the tRNA exon. In vitro transcription of the tRNA^s gene from the 48F region of chromosome 2 demonstrated that this terminal guanylate residue is added postranscriptionally, which appears to be a common mechanism for eukaryotic tRNAHis 10 biosynthesis (Altwegg and Kubli, 1980; Cooley et al., 1982 and 1984). Intron-Containing tRNA Genes About 10-20% of nuclear encoded tRNA genes in eukaryotes contain introns. Of the 300-400 tRNA genes in yeast, perhaps 40 contain introns. These intervening sequences range from 8 to 60 nucleotides in length and in any family of isoacceptor tRNA genes, so far as is known, are homologous sequences; but between different gene families, the intronic sequences are completely divergent. Introns appear to be rare in Drosophila tRNA genes, with their presence thus far confined to a pair of tightly-linked tRNAL e u genes at chromosomal site 50AB (see below). Sequences at the exon-intron boundaries in general are not conserved, although their location is always one base pair 3' to the anticodon in all cytoplasmic tRNA genes examined thus far (Abelson, 1979). For most precursor tRNAs transcribed from such intron-containing genes, the intervening sequences usually contain complementarity to the anticodon and to the anticodon stem and loop. Notable exceptions to this have been found in the I laeris tRNA y^r gene, a tRNATrp gene of Dictyosteliumdiscoideum.zxA 5. pombe tRNALYs gene. The influence of introns on expression of tRNA genes has been examined in the yeast tRNA y^r (SUP6). The intron was precisely excised from this gene (Guthrie and Abelson, 1982) and its ochre suppression phenotype was not affected. Similar studies, although in vitro, have been investigated by Wallace et. al. (1980). They showed that deletions of 8, 10, 13, or 20 bp from the intron of a yeast tRNA3Leu gene resulted in altered templates suffering no impairment compared to wild-type in their ability to direct in vitro transcription. Insertion of extra oligonucleotides up to 30 bp into a natural Hpal site inside the wild-type intron also introduced no ill-effect on the templates. However, when an extra 103 bp long fragment was inserted into the Hpal site, transcription rate was slightly reduced. Thus, the corroborating in vitro data support the contention that transcription of tRNA genes is not sensitive to the absence, presence or variable lengths of introns. This notion is also consistent with the finding in S. cerevisiae, that only one of the two closely-11 related tRNA S e r genes contains an intron (Olson el al., 1981). Both unlinked genes are equally functional in vivo, as both species have been obtained as SUP-RL1 amber (19 bp intron) and SUQ5 ochre (no intron) suppressing mutations, respectively. Additional evidence has been recently obtained in D. melanogaster, where two closely linked tRNA L e u genes at chromosomal position 50AB have almost i d e n t i c a l i n t r o n s o f 38 and 45 b p i n length (Robinson and Davidson, 1980); while a third copy, as yet cytologically undetermined, is intron-less(V. Dartnell, personal communications). Although, in these Drosophila genes, there is as yet no evidence for either in vitro or in vivo activity implicating effects of introns on their expression. tRNA Variant Genes and Pseudogenes An unusual feature of eukaryotic tRNAs and their genes stemming from molecular cloning and sequencing studies is the simplicity of tRNAs relative to the number of tRNA genes. Individual tRNAs usually correspond to multiple, identical gene copies, even if these genes are derived from different chromosomal positions. However, potential tRNA genes displaying nucleotide heterogeneity from known corresponding tRNAs have frequently been encountered. Examples of these include genes from the initiator tRNA^ e t families of D. melanogaster (Sharp et. al., 1981a), I laevis (Koski and Clarkson, 1982), and human (Santos and Zasloff, 1981), and D melanogaster variant genes for tRNA 3b Val (Addison et. al, 1984; Leung et al, 1984); tRNA4Va* (Addison, 1982 and Addison et, al, 1981; Rajput et, al, 1982); tRNA5Lys (DeFranco et. al.. 1982); tRNAG l u (Hosbach et al.. 1980); tRNA*rg (Newton, 1984); and tRNA4j S e r (Cribbs, 1982; Cribbs et. al., 1987a). All these genes showed 1 to 6 bp differences, b u t otherwise exhibit strong homology and identical anticodon sequences, to known corresponding tRNAs. In some cases where in vitro transcriptional activities have been obtained, there does not appear to be any impairment of their function that can be attributed to the nucleotide differences (see Table VI). A more extreme example of a variant tRNAH i s gene has been recently studied by Cooley et. al.,(\%2 and 1984). The structural sequence differs from the tRNA by a consecutive alteration of 8 bp, beginning at 1Z position 38 in the anticondon stem, which could be explained by a simple inversion of the normal sequence. This gene is also poorly transcribed in vitrovbA the precursors are not properly processed. However, the transcription inhibition could be alleviated by replacing the 5'-flanking sequence with that from another bona fide tRNAHis, Their results imply that the internal sequence alteration exerts only a minor influence on its poor transcriptional ability in vitro. All such reported variant tRNA gene sequences have been classified as pseudogenes by Sharp et. al. (1985). While the sequence heterogeneity is reminiscent of features found in Xenopus 5S rRNA genes (Jacq et. al., 1977), their transcriptional competence is not. Thus, the term "pseudogenes" may be misleading, since it implies non-functional templates. Furthermore, whether all such variant tRNAs, including the more extreme tRNA i^s, participate in specialized cellular metabolism analogous to the tRNA l^* and tRNA^er in the silk gland of Bombyx (Sprague et. al., 1977; Hentzen et. al., 1981) cannot as yet be eliminated. In S. cerevisiae, such variant genes are apparently either rare or totally absent (Guthrie and Abelson, 1982). The reason is unknown but it may suggest that the mechanisms of sequence rectification in the yeast genome would be much more stringent than those in higher eukaryotes. Other recorded heterogeneities are more drastic; they are composed of incomplete or remnants homologous to known tRNA genes. DNA sequencing of plasmids hybridizing to initiator tRNA in Drosophila has revealed that one clone has several DNA segments homologous to parts of tRNAi^ et x n e longest region of homology corresponds to positions 7 through 39 within the coding sequence, which represents approximately 50% of the intact gene (Sharp et. al., 1981a). The D. melanogaster DNA insert of this plasmid hybridizes to more than 30 sites in the D. melanogaster genome and has a pattern of hybridization reminiscent of middle repetitive or genetic mobile elements. Parallel observations have been obtained with a plasmid clone cross-hybridizing to a tRNA Arg probe (Newton et. al.„ manuscript in preparation). This plasmid contains a 3-half of an expected tRNAArS gene 13 starting at position 37 and ending at the mature 3-end at position 76. In situ hybridization with a 600 bp restriction fragment containing this gene-fragment also shows characteristic patterns of middle repetitive elements. The immediate 3' end does not contain the anticipated poly-T tract as the putative termination signal; but curiously, it does include the triplet CCA, which is normally added post-transcriptionally. A similar case of a tRNAPbe pseudogene has been identified in the mouse genome (Reilly et al., 1982). This truncated gene contains homology to the known tRNAPfle from position 39 to 76, including the terminal CCA triplet. The inclusion of the CCA sequence at the end of the tRNA^rg and genes is unusual and it has been ventured that the information of the mature tRNA was reverse transcribed followed by integration into the genome by a retrotransposon-like mechanism (Denison et al., 1982; Hollis et al, 1982; Wilde et at., 1982). Other pseudogenes are less stunning in their digression from complete genes. In the rat genome, the genes for tRNAAsp, tRNA6^. and tRNA^u are tightly clustered on a 33 kb EcoRI fragment which is reiterated about ten times in the haploid genome. Sequence analysis of six copies reveals that five of the tRNA l^y genes have deletions of seven nucleotides between residues 20 and 26 (Brown and Sugimoto, 1973; Shibuya et al., 1982). Three of the tRNAG l u genes lack 14 nucleotides, 11 nucleotides from the 3' end to 3 nucleotides beyond. All of the above fragmented genes in l>rosopnila($h?Lrp et al., 1981a), mouse and the rat (Shibuya et al., 1982) failed to support RNA synthesis in vitro. Even if they were transcriptionally competent, the hypothetical transcripts should fail to achieve the correct tertiary conformation typical of tRNAs. Whether these sequences retain their ability to compete for transcriptional factors have not been examined. These incomplete genes, with their degenerated states, may be more correctly addressed as pseudogenes. A novel pseudogene arrangement representing fusion of two different tRNA gene sequences has been recently reported in D. discoideum (Dingermann et al., 1985). Overlapping a segment of DNA encoding a tRNA Val, a cloverleaf-like structure resembling a tRNAftts pseudogene could be superimposed with the 5' terminus sequence, GTTCG, of the 1 1 tRNA Val g e n e j which serves to form the common tRNA TlC loop of the tRNAH i s gene. The tRNA His pseudogene does not encode several of the conserved nucleotides, found in all tRNAs, in the D loop and the anticodon-loop. Thus, a putative RNA transcribed by the pseudogene would fail to achieve tertiary structure of normal tRNA. Number. Diversity and Organization of tRNA Genes The total number of tRNA genes in Drosophila melanogaster has been estimated to be 750 per haploid genome by Ritossa et al., (1966) and Tartof and Perry (1970). A slightly lower value of 590 copies per haploid genome was given by Weber and Berger (1976). These estimates thus correspond to approximately 0.013% - 0.015% of the total DNA. By reverse phase chromatograpy, total Drosophila tRNAs can be resolved into 63 major and 39 minor isoaccepting peaks (Grigliatti et al., 1973; White et al., 1973). Many of these are probably "homogeneic"; that is, the tRNAs transcribed from the same genes but are modified to different extents post-transcriptionally. If transcription is proportional to the number of these genes, this would predict an average redundancy of 10-13 copies for each tRNA gene. Crude estimates of gene numbers for particular isoacceptor tRNAs localized at specific sites were obtained by Elder et al., (1980a,b) based on grain densities from in situ hybridzation. Their results suggest that two genes each for tRNA2^et are localized in the regions 48A and 72F-73A, whereas eight and five tRNA2Ar8 genes are localized at 42A and 84EF, respectively. However, hybridization methods are generally deemed inaccurate as demonstrated by Tener et al., (1980) in their attempt to estimate the number of genes by in vitro hybridization both in solution and on filters. The plateau level of hybridization (or the equilibrium) is dependent on the concentration of tRNA used. These experiments would therefore portend that many hidden factors can intensify the complexity of such hybridization experiments, such as the extent of modification of the tRNA probes, the presence of polymorphic tRNAs and pseudogenes (see above). Other features of tRNA gene organization and arrangement may also affect the ability to detect genes by hybridization. A major finding of the DNA sequence analysis of the D. 15 melanogaster recombinant plasmid pCIT12 concerned the arrangement of the individual genes (Hoveman et al, 1980). Eight tRNA genes have been detected on pCIT12, which are irregularly spaced and arranged such that five genes are in one transcriptional direction and three in the other. Since the transcription direction is different for various genes of the same isoacceptor, the tRNA genes are capable of forming inverted repeat structures in which the homology extends over the entire coding region of the tRNA genes. Structures with inverted repeat stems of 70-100 bp were observed by electron microscopy during analysis of heteroduplexes of pdT12 and the vector DNA ColEl (Yen and Davidson, 1980). The nature of the inverted repeats has been enlightened by DNA sequencing; they are formed by homologous tRNA genes having opposite polarity. The occurrence of inverted repeats thus explains the difficulty in detecting tRNA genes in the original heteroduplex analysis with radioactive tRNA probes (Hoveman et al,. 1980). If similar tRNA gene arrangements occur with a reasonable frequency in the genome, one would expect that the estimation of gene numbers to be less than actual. One major advantage of studying tRNA gene organization in Drosophila is the exploitation of large polytene chromosomes as hybridization templates. Transfer RNA probes of high specific activities can be obtained by iodination with 125i and hybridize directly to the denatured chromosomes (Commerford, 1971; Prensky, 1976). With radiolabelled total 4S RNA as a probe, over 50 sites of hybridization could be detected, and approximately half of these represent major sites of hybridization (reviewed by Elder et al., 1980a). Most of the them are distributed randomly over the four arms of the two large autosomes. The only major site on the X is at bands 12E and no hybridization was observed over the small chromsome 4. Further refinement of in situ localization was conducted using highly purified tRNA isoacceptors (Grigliatti et. al., 1973,1974;Delaney et. al., 1976; Dunn et al., 1978, 1979a; Kubli and Schmidt, 1978; Schmidt et. al., 1978; Hayashi et. al., 1980; Schmidt and Kubli, 1980). These parallel lines of experiments showed that in general: (1) many isoacceptor can be found at more than one site in the Drosophila melanogaster genome, (2) more that one isoacceptor 16 can be detected in the same region. Even though in situ hybridization is a powerful mapping technique, the sensitivity is too low to tell whether genes for the different isoacceptors are intermingled or segregated from each other. This problem has been overcome by molecular cloning and DNA sequencing technology, where the exact number of genes and their relative arrangement within the cluster can be easily determined. The first isolated recombinant plasmid containing Drosophila tRNA genes was pCIT12, derived from the 42A region (Yen et. al., 1977). DNA sequencing of segments containing tRNA genes (Hovemann et. al., 1980) showed the presence of eight genes within a 9.3 kb region: three for tRNA s^n three for tRNA2Lvs. one for tRNA2Af8, and one for tRNA^e. The analysis was extended in both directions by "molecular walking" (Yen and Davidson, 1980; see Chapter I) where a total of 94 kb of sequence derived from 42A was recovered on overlapping recombinant X phages. By restriction mapping and subsequent DNA sequencing, a total of 18 genes were identified (including those on pCIT12) scattered over a region of approximately 46 kb: eight for tRNAA s n, f o u r f o r tpj^Arg, five for tRNA2Lvs, and a single tRNAIle gene. These genes are irregularly spaced and are transcribed from both strands of the DNA. Furthermore, redundant genes for a particular tRNA isoacceptor contain identical sequences (fig. 3)-The other major hybridization site at90BC has also been similarly analyzed by molecular walking (DeLotto and Schedl, 1984). Although the analyis is incomplete, at least six tRNA genes have been identified scattered over approximately 31 kb, which have been divided in seven smaller regions in their discussion. In region 1, one for tRNA3i»Valj o n e f o r tRNAPro; in region 2, one or possibly two for tRNAAl* one for tRNAPf0; in region 4, two for tRNATbr. Other regions (3.5,6 and 7) have not been sequenced but are thought to contain additional tRNA genes. The arrangement of the characterized tRNA genes are also randomly interspersed and similar to those at 42A; they are also transcribed from both DNA strands (fig. 3) Although not as extensively studied as 42A and 90BC, other smaller gene clusters derived from different chromosomal locations have also been isolated in plasmid clones. These 17 C "< ? c c c <<< 25 k b 25 kb 42A — O O D > a - Q-< 6 7 X' 31 k b 90BC F i g . 3. Tvo typical chromosomal sites enriched for tRNA genes. Both sites have been characterized by molelcular walking and DNA sequencing. The size of the cloned region is indicated on the right in kilobases (kb). Top: The 50 kb region from 42A site has IS genes encoding five different tRNA isoacceptors (modified from Yen and Davidson, 1980). Bottom: The 31 kb region from 90BC site has at least six genes encoding four different tRNA isoacceptors. The structural genes are depicted as dots and their direction of transcription is indicated by arrow heads from 5' to 3'. At the 90BC site, genes in regions 5. 6, and 7 have not been sequenced but are known to contain homology to total 4S RNA from Southern hybridization studies. The tRNA v* tRNA1*0 genes in region 1 are identical to those reported in Chapter IV of this thesis. The sequence of one tRNA gene (?) has only been partially determined but it probably corresponds to another tRNA*18 gene (modified from DeLotto and Schedl, 1984). 18 include genes for tRNA2Lys (Gergen 0* al. 1981), tRNA i^" (Hosbach et, al.. 1980), tRNALeu and tRNANe (Robinson and Davidson, 1980), tRNAG1y (Hershey and Davidson, 1980), tRNAi^ et (Sharp et. al.. 1981a). tRNA4Val (Addison, 1982 and Addison et al.. 1981), and tRNA r^g (Newton, unpublished). All these smaller clones impart the same pattern of irregular gene arrangements resembling those at 42A and 90BC. Hybridization studies with yeast DNA by Schweizer et al., (1969) and Feldmann (1976) showed that there are approximately 360 tRNA genes. This would suggest that the average reiteration frequency is on the order of eight genes per tRNA species. The general features that emerged from isolation of various nonsense suppressor tRNA genes and from random cloning experiments (Guthrie and Abelson, 1982) are that a particular tRNA species is encoded by multiple, but solitary, genes found on different chromosomes. In fact, there is little evidence for the clustering either of isoaccepting species or of tRNA genes per se. With possibly rare exceptions such as dimeric tRNA genes that are coordinator/ transcribed (Schmidt et a/,1980), the overall distribution of tRNA genes may in fact be close to random. This lack of organization is contrary to that seen in Drosophila, where tRNA genes are typically found in clusters containing multiple copies of several different tRNA genes (see above). In the haploid human genome, there are approximately 1000 tRNA genes representing about 60 different genes of 10 to 20 copies each (Hatlen and Attardi, 1971). There is some evidence intimating that the organization of tRNA genes in human is similar to that in the fruitfly. The initiator tRNA**61 genes have been cloned and sequenced. They are solitary genes embedded within sequence of high homology and are distributed randomly throughout the genome (Santos and Zasloff, 1981). In contrast, interspersion of different tRNA gene types have been detected in X phage clones from a human library. One copy each of the genes for tRNA^ ys tRNA^n, and tRNALeu have been localized within 1.6 kb and are separated from each other by about 0.4 to 0.5 kb (Roy et. al., 1982). Recently, a tRNA i^u gene was identified on a 2.4 kb fragment that also contains a sequence capable of folding into a tRNA-like structure (Goddard etal., 1983). Whether this is fortuitous or a pseudogene 19 is not known. Clusters comprised of different neighboring tRNA genes could also be the predominate theme in the rat and mouse genomes. Although detailed structural analyses in these two organisms are lacking, several recombinant clones containing a mixture of genes for tRNA Asp, tRNAG1y, and tRNAG*u have been discussed in the section dealing with pseudogenes. Transcription of tRNA Genes Transcription studies of tRNA genes from different organisms demonstrated the necessary promoter elements to direct accurate transcription reside internally in the genes (Ciliberto et al., 1983; Murphy and Baralle, 1984; Stewart et al., 1985). Nuclease-mediated deletions performed on these templates have identified the essential internal control regions (ICRs) to be split into two non-contiguous sequences corresponding to the D and T loops within the tRNA, now variously termed 5-ICR or A box and 3-ICR or B box, respectively (Sharp et al., 1983a). Because these ICRs are highly conserved sequences throughout eukaryotic tRNA genes (excluding organelle tRNAs), they would appear to serve a critical function in gene regulation as well as in the tRNA itself. The comparison of ICRs in over 100 tRNAs and tRNA genes has led to the generalized 5' ICR sequence, 5 -TRRYNNARYGG- 3', corresponding to positions 8 to 19 within the D loop and the 3' ICR sequence 5 -GGTTCGANTCC-3 corresponding to positions 52 to 62 within the T loop (Sprinzl et al„ 1987; Sharp et al., 1985). The sequence of the 3' ICR is highly constrained, incidentally, in both prokaryotes and eukaryotes. The stringent conservation may reflect the added importance of the T loop in the proper function of the tRNA itself, rather than strictly relegated to gene regulation perse. Mutational analyses performed on the S. cererlsiae\RSH$Yr (Kurjan et al, 1980; Allison et al., 1983), the Xenopus tRNAiM e t (Folk and Hofstetter, 1983) and the S. pombe tRNASer genes (Willis et al., 1984) have all honoured the importance of the split promoter elements in mediating accurate tRNA gene transcription. Many point mutations that engendered 20 drastic decreases in their transcription ability of the templates all mapped to these elements. Additional sequences outside of these central promoters may also affect tRNA gene function. In the S. pombe tRNA$er g e n e (Willis et al., 1984), a transcription-down mutaion (A4$) has been mapped at the junction of the extra arm and the T stem. Similar down mutations within the extra arm and T stem coding region have been reported for the tRNA P r o gene of C. elegans (Ciliberto et. al., 1982) and the tRNA T v r gene of S. cerevisiae (Kurjan et al., 1980; Allison et al., 1983). It is not clear whether these mutations have uncovered yet another separate control element, or the extra arm is merely an extension of the 3' ICR. Sharp et al., (1981b) favored the latter explanation based on their 3' deletion analyses of the D. melanogaster tRNA^rg gene. However, their interpretation is weakened by their use of grossly altered templates, rather than more selectively targeted mutations. Thus, the separate contributions of the extra arm and the T loop to the overall transcription efficiency of the gene could not, realistically, be cleanly demarcated based on their studies. In contrast to the 3' ICR, the canonical sequence of the 5' ICR is much more degenerate. Furthermore, in comparison of all known sequences of tRNAs and tRNA genes showed that the D loops can also be variable in length (Sprinzl, et al., 1987). Extra nucleotides have been localized adjacent to positions 17 and 20, and are numbered as 17A, 20A and 20B, accordingly. Also, the 5' ICR (GGTCTAGTGG) of the C. elegans tRNA P r o gene is functionally interchangeable with the first 11 nucleotides (AGCCAAGCAGG) from the 5S rRNA gene promoter from lenopus (Ciliberto et al., 1983). Even though both sequences honour the 5' ICR consensus motif, both in fact differ from each other by many point changes. This high degree of functional flexibility in the 5' promoter would imply that the DNA sequence perse is not critical in conferring trancription competence to the tRNA gene. However, their "hybrid" 5S/tRNA constructs have only been tested in lenopus extracts, it would be interesting to test whether extracts derived from C. elegans are equally indiscriminate. This is because certain polymerase III transcription extracts have been postulated to confer species specificity to some tRNA genes, through some unknown interaction between "compatible" 3" ICR and the 5 - flanking sequence (Sharp et. al., 1985; see Formation of 21 Transcriptional Complexes). If the primary DNA sequence of the 5' ICR is not critical for transcription of tRNA genes, then could maintenance of the stem-loop structure be sufficient to sustain template activity? Questions of this sort have been explored by contructing of three D-stem mutants in a yeast tRNALeu gene by synthetic oligonucleotides (Mattoccia et al., 1983). The first mutant contains simultaneous changes of GCC to AAA at positions 10, 11 and 12; while the second mutant contains changes GGC to TTT, in the complementary strand corresponding to positions 24,25 and 26. Both of the mutants would disrupt the proper base pairing structure in the D-stem. The third mutant is a double mutant, coupling both sets of changes to preserve the stem-loop configuration in the 5' ICR. Transcription of the yeast genes in the heterologous Xenopus germinal vesicle system showed that all three mutants are not dramatically affected, However, accurate excision of the intron occurs only in the double mutant where the proper D stem-loop structure can be maintained (Baldi et. al., 1983) The behaviour of the mutants are entirely different in the homologous yeast extracts (Newman et. al., 1983) The AAAjO-12 mutation reduced transcription 10 fold, while the complementary mutant TTT24-26 showed very little effect. The double mutant is also poorly transcribed. It would appear that in the homologous system, the sequence of the control region is critical, while the capacity to form a stem-loop structure is not. These completely opposite results described above would thus caution that the degeneracy of the 5' ICR consensus sequence distilled from many diverse organisms may not necessarily indicate functional flexibility. It could also mean that the great diversity in sequence could confer some "sequence context" specific for a particular tRNA gene that is read differently by different transcriptional complexes. Flanking Modulatory Sequences While the internal split promoter sequences are essential for faithful transcription of the tRNA genes, deletion of the flanking sequences upstream and downstream can also markedly affect the rates of transcription. One method for identifying potential modulatory zz elements is to simply compare flanking sequences of a large number of tRNA genes. For some members of certain tRNA gene families, the 5'- and 3'-flanking sequences are highly conserved. These include genes coding for the Drosophila melanogaster tRNAjMet (Sharp et al., 1981a). tRNAGlu (Hosbach et at.. 1980), and tRNAG1Y (Hershey and Davidson, 1980), and tRNAAr8 at 12DE (Newton. 1984; chapter III), the human tRNAjMet (Santos and Zasloff, 1981), tRNATfP of D. discoideum (Peffley and Sogin, 1981) and the rat tRNA^ sp (Shibuya et al., 1982). The sequence conservation in these gene members often extends several hundred base pairs in both directions from the structural genes. However, it is unlikely that these conserved flanking sequences are held captive by virtue of functionality; rather, their patterns are suggestive of duplicative events either by unequal crossing-over (tRNAArg a n d tRNAGl» genes of Drosophila) or transposition-like mechanisms (tRNA^ rP gene of D discoideum). With a few exceptions (see below), sequence conservation of this type is generally rare. In fact, flanking sequence similaries that can be attributed to transcriptional control function are not common. 5' Negative modulatory sequences resembling RNA polymerase III termination signals have been shown to inhibit in vitro transcription of tRNA2Lvs (DeFranco et al., 1980 and 1981), and tRNA2Af8 (Dingermann et al., 1982) genes from D. melanogaster. The sequence, GGCAGTTTTTG, is well conserved in front of a number of tRNA2Lvs genes. This conserved element is positioned from about -12 to -25 from the start of the structural gene, although the absolute positions do vary in the different genes (Hoveman el at, 1980). Deletion of this sequence in tRNA2Lys gene 2, and replacement with pBR322 DNA, led to a dramatic increase in transcriptional efficiency in I. laevis germinal vesicle (GV) extract (DeFranco et. at., 1980). The function of this element is extremely sensitive to its position relative to the gene; moving this sequence by only one base pair closer to the tRNA gene can substantially neutralize its inhibitory effect (DeFranco et at, 1981). Another tRNA2Lvs gene (gene 4 in their studies), has only four consecutive T residues in its conserved element; however, this gene is efficiently transcribed in GV extracts. The D. melanogaster tRNA2Af8 gene also has five consecutive T residues beginning at 2 3 -21 from the structural gene (Dingermann et al., 1982). This sequence also appears to inhibit efficient transcription of the tRNA gene, but only in homologous extracts and not in extracts derived from either HeLa cells or GV. A similar stretch of five consecutive T residues has also been identified in another Drosophila tRNA3LVS gene. These residues are located at -20 to -24 in front of the gene, and yet when challenged by ia vitro t r a n s c r i p t i o n in KcO cell extracts, this gene is highly efficient. From these studies collectively, the conceptual edifice of the 5' poly T residues merely acting as general RNA polymerase III termination signals to inhibit transcription would certainly collapse as being too simplistic. As pointed out by Sharp et al., (1985), the length of such poly T residues does not always consistently correlate with the strength of inhibitory effects observed in the different extracts prepared from both HeLa cells and Drosophila KcO cells. In this regard, it is worth noting that an active SUPlfb locus (i.e. functional in vivo) in yeast, which codes for an ochre suppressor tRNA y^r does contain six T residues positioned at -14 to -19. Moreover, this gene is also efficiently transcribed in homologous cell-free extracts (reviewed by Sherman, 1982). Likewise, in the Bombyx mori, there are two blocks of five T residues within the first 21-bp 5' to the tRNA2A*a gene. The natural terminator is composed of a single block of four T residues located 17-bp downstream from the mature coding sequence. Yet, this gene is constitutively expressed in the silk worm as well as efficiently transcribed in vitro (Sprague et al, 1980). An extensive study of the I. laevis tRNAi^ et g e n e D V systematic resections carried out on the 5-flanking region described a surprisingly complex mosaic of modulatory elements, including other inhibitory sequences with the potential to form Z-DNA (Clarkson et a/,1981). Positive modulatory sequences have also been tentatively identified. In the Drosophila tRNA2Ar« gene, removal of sequences between -8 to -33 in the 5-flanking region caused a 95% reduction in transcription efficiency (Sharp et. al., 1981b). However, the nature of the positive modulatory element(s) has not been well defined in these crude deletion studies. More recently, by using site-directed mutagenesis, Sajjadi (1987) showed that a pentanucleotide, TCGCT, may play a positive modulatory role in the transcription of a Z4 Drosophila tRNA^al. A degenerate form of this sequence, TNNCT (N=any nucleotide), is also imperfectly correlated with other tRNA genes that are efficiently transcribed in vitro. Since the element is rather short and the degenerate form may occur frequently, Sajjadi (1987) proposed that its functional competence may either be positionally dependent (beyond -30) or in concert with additional surrounding modulatory elements. The most completely characterized positive modulatory element to date has been that in S. cerevisiae. Four of the nine tRNA3Leii genes showed extensive 5- and 3-flanking sequence homologies, in addition to their intervening sequences (Raymond and Johnson, 1983; Frischloff et. al., 1984). Deletion mutagenesis performed on the amber suppressor tRNA 3AL e u showed that the 5'-ftanking region between -1 and -15 is critical for in vitro activity in yeast cell extracts (Raymond et. al., 1985). This region also contains a pentadecanucleotide sequence, TTTCAACAAATAAGT, that is highly conserved in all four genes. Clones with progressively deleted 5-flanks were transformed into different yeast strains containing the amber mutations lys2-801, met8-l and tyr7-l. Upon transformation with the yeast-vector clones, suppression is very effective at the met8-l locus with all forms of the tRNA3ALeu constructs. Suppression of the lys2-801 and tyr7-l mutations in the yeast host strain parallels the template activities in vitro, correlating with the absence or presence of the putative modulatory element. They noted that this sequence is also well conserved in 14 other different yeast tRNA genes, although their positions vary somewhat between them (fig. 4). The tRNALe", tRNATyf, tRNAAr«, and tRNAGlu genes which show the best fit to the sequences all code for tRNAs which are abundant in S. cerevisiae (Ikemura and Ozeki, 1982). The authors proposed that this sequence may be one mechanism by which yeast cells adjust tRNA biosynthesis to match demand created by codon use preferences. DNA sequences referred to as sigma and delta have been found adjacent to tRNA genes of yeast (del Rey et al., 1982; Eigel and Feldmann, 1982). The 340 bp sigma element is repeated many times in the genome and, when found adjacent to a tRNA gene, is always located either 16 or 18 bp to the 5' side of the mature coding sequence. The position of the 340-bp \ 23 -30 -25 -20 -15 -10 -5 -1 5 10 15 20 NfiflE SHRTCH TTTCRRCRRRTRRGT TaTCRRCRflgTRRtT aTTCRRCfiRtTRRaT TTaCRRCRRRaflaGa aTTCRRCRRRTRgta aTgCflflCflaflTflaGT TTaCflflCflflflaflgta TTTCflflgfltggflflGa TTTtflgCflflflaflflag aTTtflflaaaflTflfltT gagCflflCtflflTfltaT TTaafltCRRRafitGa aTTCflgtflgaTaRGT TaTCaRttgRaflRGT gTTCRtaflflgaRRtT LEU3 -100-SUP53 80 TVRSUP4 80 TVRTVG 73 RRG19? 73 GLUPV20 73 RRG18U 67 TRP 67 GLH 67 SERRL1 60 PHEPT5 60 GLUPV5 60 11ET3 60 HISFD12 60 HISFD2 60 F i g . 4 . 5 - F l a n k i n g sequences of d i f ferent tRNA genes f rom S. cerevis iae. The putative posit ive modulatory sequence of t R N A 3 L e u gene, T T T C A A C A A A T A A G T , is given i n capi ta l letters and used as the reference sequence (100%). The other genes wi th s i m i l a r pentadecanucleotide sequences are l is ted below. The bases which correspond to the t R N A 3 L e u sequence are capi ta l i zed and mismatches are in lower case. The pos i t ion of the sequence relat ive to the coding sequence is depicted and the genes ordered by percent match (from Raymond et. a l . , 1985). 26 delta sequence relative to the tRNA genes is not as precise. Both of these elements have been hypothesized to influence transcriptional regulation of the adjacent genes and share a common sequence, -CAACA-, found very near their ends. This same sequence constitutes part of the conserved pentadecanucleotide observed in tRNA3Leu. Other examples of positive modulatory sequences have been tentatively identified in genes coding for a human tRNAG*u (Goddard et al., 1983) and in the tRNA2Ala gene from Bombyx mori (Larson et al., 1983; Young et al., 1986). The human tRNAGlu gene is transcribed very efficiently in HeLa cell extracts and its '^-flanking sequence is capable of forming a tRNA-like structure. This tRNA-like structure has been brought to attention as a potential regulatory element, but has not been critically tested. From deletion analyses and flanking sequence replacement of the tRNA2A*a gene derived from Bombyx, the hypothetical positive modulator element has been roughly localized at position between -16 and -37, although at present the nature of this sequence is not as tightly defined as that in yeast (Young et al., 1986). The influence of the 3'-flanking region of transcription of tRNA genes has not been as well characterized. Initial work showed that the human tRNAjMet g e n e supported both in vitro and in vivo transcription even though its natural 3'-flanking region had been entirely replaced with a thymidine kinase gene (Adeniyi-Jones et al., 1984). However, St. Louis (1985) in his transcription experiments employing a Drosophila tRNA$er hybrid gene (designated as pDt73) with its 3'-end fortuitously removed during cloning reported that this gene is virtually inert in homologous extracts. Re-attachment of a 3' region from another tRNA4^er gene somehow allowed this construct to regain detectable, but low level, of transcription. The 3-flanking region of the Bombyx gene mentioned above, appears to be required for full transcriptional activity (Wilson et. al., 1985). The use of high template DNA concentrations to overcome the effects of an inhibitor present in extracts was found to be partly responsible for masking the contribution of this region in their previous investigations. This region has now been shown to participate in factor(s) binding during transcriptional activation (unpublished, cited in Young et. al., 1986). This observation has Z7 been fortified from studies using a yeast suppressor tRNATyr {SLTPfO) gene (Allison and Hall, 1985). Deletions into the oligothymidyate sequence 3' to the gene appear to diminish its ability to compete for a limited transcription factor in the extracts. This is also consistent with nuclease protection experiments (Klemenz et at., 1982; Camier et at., 1985), where this region in several tRNA genes are resistant to nuclease attack. Formation of Transcriptional Complexes All Class III genes, including 5S rRNA, VA RNA, Alu I, some snRNAs and tRNAs, are transcribed by RNA polymerase III. Initial studies with the purified enzyme from mature I. taevis oocyes and human KB cells failed to produce transcriptional activity with either 5S rDNA or VA DNA templates (Parker et at., 1976). Instead, faithful transcription of these genes could be elicited in the presence of bulk chromatin, along with the addition of purified RNA polymerase III. These observations led to the suggestion that several protein components in chromatin are necessary to carry out the accurate and selective catalytic process. Several of these components appear to be shared among the transcriptional complexes of all Class III genes, because partially purified RNA polymerase III transcriptional complex from KcO cells can potentiate transcription of both 5S rRNA and tRNA genes (Burke et at., 1983). As well, serum antibodies of patients with autoimmune diseases diagnosed as systemic lupus erythematosis have been shown to specifically immunoprecipitate ribonucleoproteins, or RNPs (Lerner et at., 1981). These small RNP particles have been shown to form a complex with tRNAs and 5S rRNA in uninfected cells (Rinke and Steitz, 1982), and VA RNA (Gottesfeld et at., 1984). Addition of the serum antibodies can inhibit transcription of 5S rRNA and tRNA genes in HeLa cell extracts, raising the distinct possibility that the inhibited antigen could be a basic component(s) in Class III transcriptional complexes (Rinke and Steitz, 1982). Fractionation of cytoplamic extracts from human KB cells on phosphocellulose and by additonal chromatographic steps revealed that in addition to RNA polymerase III, at least two other ^ distinct fractions were required for reconstitution of specific tRNA gene 28 transcription (Segall et al., 1980). The changeability of the phosphocellulose factors from human cells with similar fractions derived from lenopus also suggest these components are evolutionarily well-conserved (Shastry et al., 1982). Similar purification steps carried out with cell extracts derived from Bombyx and S. cerevisiae appear to correlate well with the human studies (Ruet et al., 1984). The above investigations, would thus imply that the transcriptional complex contain at least two distinct fractions and RNA polymerase III. Although it is now well established that an additional factor, TFIIIA, is required for specific transcription of 5S rRNA genes, fractionation of other eukaryotic extracts thus far has failed to reveal further repertoire of transcriptional components (Segall et al., 1980; Burke et al., 1983; Shastry et al., 1982; see below). The 3' ICR of tRNA genes appear to bind stably to at least one component in the transcriptional complex (Lassar et al,. 1983; Newman et al., 1983). The specificity of this interaction has been examined by DNase I protection or competition assays (Klemenz et al., 1982; Fuhrman et al., 1984; Van Dyke and Roeder, 1987). In addition, a yeast SUP53 tRNA gene inflicted with a mutation at the highly conserved nucleotide C56 to G56, fails to bind to this factor(s) and shows a concomitant decrease in its competitive ability (Newman et al., 1983). This factor, variously referred to as Factor C, TFIIIC, or tau, has been partially purified from yeast (Ruet et al., 1984), and HeLa cell (Fuhrman et al., 1984). The HeLa cell factor is required for formation of stable transcription complexes and for faithful transcription of both an adenovirus VAI gene and the Bombyx tRNA2A*a gene. Recently Van Dyke and Roeder (1987) have suggested that TFIIIC may exist in two distinct forms, a cytoplasmic form and a nuclear form. Both forms of TFIIIC possess functional activity when assayed by in vitro transcription using a VAI RNA gene as the template. However, DNase I protection experiments showed that only the nuclear form is able to afford a protection ladder in the 3' ICR. Because the cytoplasmic form is incapable of binding and it is physically segregated from the gene, it was suggested that this form may be inactive in r/VzKVan Dyke and Roeder, 1987). Barring experimental artifacts, they further proposed that a general mechanism of a Class III gene regulation may depend upon the 29 interconversion of the active and nascent forms of the TFIIIC. Similar results have also been obtained by Yoshinaga et al., (1987), although their interpretation of the data differed slightly in detail. By chromatography and sedimentation velocity gradient analysis, they shoved that the two forms of TFIIIC, named 1 and 2, are distinct components of approximately 400-500 kDa and 200 kDa, respectively, rather than simple activation of the same nascent form. The TFIIIC2 can bind tightly to the 3' ICR while TFIIIC1 has very low affinity for the 5' ICR, as revealed by DNase I protection experiments. Either form alone can only sustain barely detectable transcription in vitro using VAI gene as the template. However, active transcription complexes can be reconstituted with the presence of both complements. While the VA RNA gene is able to interact with TFIIIC without other components, stable complex formation with tRNA genes, at least those in Drosophila and human requires the presence of another factor, TFIIIB. Furthermore, TFIIIB does not appear to remain stably bound, but recycles rapidly (reviewed by Lassar et al., 1983). This is consistent with the mechanism described by Dingermann et. al., (1983) for stable complex formation using various deleted tRNA2ArS genes in an unfractionated Drosophila KcO cell extract. In their scheme, they proposed that the two transcription factors, TFIIIB and TFIIIC, interact with the D and T control regions of the gene, respectively. From competition experiments, it appears that the both the 5'- and 3'-flanking sequences aid in the binding stability of these factors (Sharp et al., 1983b; Schaack et al., 1983). The cooperative binding of the two factors would then bring about stable complex formation, although the binding activity of TFIIIB has yet to be shown experimentally. These factors described above, however, appear to display some species specificity. In some cases, the Drosophila KCo cell TFIIIC is functionally equivalent to that in HeLa cells extract, and can be reconstituted with human TFIIIB to promote transcription of tRNA genes. Sharp et al., (1985) suggested that for this to occur, there may be some type of "compatibility" between the TFIIIC and the 5-flanking region of the tRNA gene. Such "compatibility" may not always be consistently maintained between species. As well, the 30 DrosophiteTLllWl cannot replace the human counterpart in the heterologous transcription system (Dingermann el al., 1982; Burke el al., 1983). Thus, tRNA gene transcription may involve a general mechanism, but a higher level of complexity is raised by the additonal revelation of species specificity. From competition assays, tRNA genes have been shown to sequester, rapidly and stably, a limiting component when added to transcriptionally active cell-free extracts. These assays rely on the ability of a test gene (or gene fragments) to inhibit the transcription of a "reference" gene. Mutant Drosophila tRNA f^ g genes with varying degrees of deletions from either the 5' or the 3' side were examined for the ability to compete for limiting components (Sharp et. al., 1983b) and it appears that the 3' ICR is the most important region for stable complex formation. However, the stability and the rate of binding is also affected by the presence of the DNA throughout the coding as well as the flanking regions. In particular, removal of the 5' ICR invariably leads to a drastic reduction in the maximum rate and strength of the complex formation. Thus, this kinetic effect implies that this stable complex relies on some kind of recognition of the 5' ICR prior to stable binding to the 3' ICR (Schaack et. al., 1983). While formation of this stable complex for the tRNA Arg gene occurs fairly rapidly, there is a further lag phase between 10 to 30 minutes before transcription is detectable (Schaack et. al., 1983). This latent period is also temperature sensitive within the range of 24 °C to 30 °C; thus, suggesting a second priming step perhaps involving rearrangement of the tRNA gene and the bound components prior to initiation of transcription (Sharp et al, 1985; Schaack et al., 1983). Maturation of tRNA Transcripts tRNA transcripts are initially synthesized as precursors containing extra nucleotides at both the 5' and the 3' ends. The initiation sites of transcription are usually 4-7 nucleotides upstream of the structural gene, and normally coincide with a purine residue. Termination is thought to involve an oligothymidylate sequence that is located near the 3' end of the mature coding sequence. Maturation of the transcription probably occurs by removal of, 31 first, the 5' and then the 3' extra nucleotides by ribonucleases, followed by addition of the CCA end by nucleotidyltransferase (reviewed by Young et al., 1980; Lund et al., 1980; Ghosh and Deutscher, 1980). If intervening sequences are present, they are exicised by splicing enzymes. The accuracy of the process depends on the proper conformation of the tRNA precursor (Ogden et al, 1979). Since the tertiary structure is maintained by unconventional base pairing involving modified nucleotides, this would imply that at least a limited amount of modification must have occurred before excision of the intron. The trimming at the 5' end is thought to require an RNase P endonuclease, first identified in £. coli by both biochemical and genetic means (Shimura et al., 1980). An enzyme with similar catalytic activity has also been partially purified from Sc. pombe. This enzyme can process the 5' end of an £ coli tRNATyr plus a variety of other yeast tRNA precursors produced in vitro, to the mature 5' terminus (Kline et al., 1981). From these in vitro studies, this reaction appears to be a one step process. The in vivo transcripts of yeast tRNATyf, tRNA2Ser, and t R N A m i n 0 r S e r genes have been examined by a modified Northern blotting procedure (Hopper and Kurjan, 1981). For all three tRNA genes, they were able to detect only three species of transcripts. Two correspond to the hypothetical precursors of 108 and 92 nucleotides; the latter is probably a transcript with both the 5' and 3' extra nucleotides removed but retaining the intron. The last transcript is 78 nucleotides in length, corresponding to a full size mature tRNA. Thus, their results agree with the in vitro studies, suggesting that both the 5' leader and the 3' tail are removed as a single-step process. However, injection of a yeast precursor tRNA^yr into the nucleoplasm of the Xenopus oocyte showed that removal of the 5' leader is at least a three-step reaction with a progressive removal of small oligonucleotides, rather than as a single catalytic step (Melton et al., 1980), The reasons for the discrepancy are not known, but there could be some fundamental differences in the variety of organisms used as model systems. The enzymology of splicing in eukaryotic tRNAs has been best characterized only in yeast. Several temperature-sensitive mutants defective in the process have been isolated (Peebles et al., 1979), The precursors accumulated at the nonpermissive temperature in the 32 raalot lost mutants have provided substrates for the assay of splicing in vitro. Peebles et al. (1979) have isolated an activity from a ribosome wash that is capable of removing the introns from all ten of the precursors accumulated from the mutant strains. The splicing components appear to be fairly pure since approximately 96% of the input tRNA precursors can be spliced, with very little random degradation or abortive splicing pathways. In this ia vitro system, occasionally transient appearance of smaller RNAs with the mobility of half-tRNA-sized molecules. From their kinetic analyses, they proposed that these half molecules are probably true intermediates in the splicing reaction. Furthermore, accumulation of higher amounts of the half-molecules can be enhanced by the omission of ATP. These half-molecules have been subsequently purified by gel-electrophoresis and shown to be substrates for the second step in the splicing process, formation of a phosphodiester bond between the two half tRNAs (Knapp et al., 1979). Both the splicing endonuclease and the ligase required for rejoining the processed tRNA have been physically separated, although in vivo they may be integral components of a larger splicing complex. In these subsequent reports, the endonuclease appears to be integrally bound in membrane, rather than associated with the ribosomes as reported initially (Peebles et al., 1983). The discrepancy has never been explained but it could be due to contamination of rough endoplasmic reticulum, which is intimately associated with ribosomes. The activity of the enzyme is stimulated in the presence of spermidine (to stabilize the secondary structure of the tRNA precursor) and non-ionic detergents. The ligase, however, appears to be a peripheral protein also associated with membrane, but it is easily dislodged during the preparation steps (Greer et al., 1983)- Its activity is stimulated in the presence of Mg?* and ATP. Their intimate association with membranes is consistent with the observations that splicing of pre-tRNA is coupled with transport of the mature transcript from the nucleus. Also, in the losland r/ra/yeast mutants precursors are found to accumulate in the nucleus (Guthrie and Abelson, 1982). Characterization of the splicing intermediates has revealed several interesting features (fig. 5). The intervening sequence is probably excised as a discrete, linear polynucleotide 33 F i g . 5- Splicing of tRNA in yeast. The pathway proposed for joining of tRNA halves by yeast ligase is summarized after Greer (1986). The first step shows the formation of halves from pre-tRNA by yeast endonuclease. The sequence of the subsequent reactions is tentative since the precise order has not been determined and many of the enzymes involved have not been identified. IVS-intervening sequence. The different symbols around the phosphates are to facilitate tracing each through the ligation pathway. The yeast product (last step in the diagram) shows a 2' phosphate which is subsequently removed by a phosphatase. 34 X IVS h ® V X NI 3' E N D O N U C L E A S E Y OH .NI X C Y C L I C O P E N I N G -© OH M [ p j p p A + K I N A S E X x -© \ 7 ALT PPAA+LIGASE - I LIGASE- A A \7 X Y - ( p LIGASE YEAST PIODUCT 33 with 5-OH and 2',3'-cyclic phosphodiester termini (Knapp et al, 1979). Similar analyses of all the gapped tRNA products in yeast reveal that in each case the endonuclease reaction produces 2',3'-cyclic P in the 5' half tRNA and 5'-0H termini in the 3' half. It has been proposed that cyclic phosphodiesterase activity associated with the ligase can catalyze the formation of a 2-P at the 5' half-tRNA. The 5'-0H terminus of the 3' half-tRNA is then phosphorylated, the 5'-phosphate is adenylated by an activated ligase, and this is followed by ligation of the half molecules and release of AMP. The 2-P is then subsequently removed by an unknown phosphatase (Greer et al., 1983). In HeLa cells and also possibly in lenopus, no 2',3-cyclic phosphates have been found (Filipowicz and Shatkin, 1983). In these higher eukaryotes, the 5' half- and 3' half-molecules contain only 3-P and 5-OH groups, respectively, and the ligation steps probably involve a slightly different RNA ligase. As mentioned in an earlier section, the transcription of tRNA genes is not particularly influenced by the presence of the intervening sequence. However, as shown by Johnson and Abelson (1983), the proper splicing of the intron in a yeast SUPS-Q tRNA is required for the correct modification of the mature transcript in an ensuing step. The precise deletion of the intron from the gene significantly reduced the suppressor activity of its product relative to that of the unaltered gene. Analysis of the anticodon of the tRNA showed that the sequence normally contains a ¥ in its middle position. Removal of the intron somehow engendered a defect in pseudouridylation reaction as well as concomitant decrease in the amount of suppressor tRNA. As to the absence of ¥ in the anticodon, the authors suggested that the "¥ synthase" probably requires the proper pairing of the intervening sequence with the anticodon. It is important to note, though, tRNA T v r is the only yeast tRNA sequenced to date that contains 7 in the anticodon. The presence of intervening sequences in the other tRNAs does not appear to be correlated with anticodon modifications in general. Other Unusual tRNA-Mediated Cellular Functions in Eukaryotes 1. Protein Degradation: 36 Besides the familiar role of tRNAs in protein synthesis, they have also been implicated in the ubiquitin and ATP-dependent pathway in protein degradation. In an earlier investigation, it was shown that a free ct-NH2 group of the protein substrate is an important structural determinant for recognition by the ubiquitin system (Hershko et al., 1984 and 1986). More recently, Ferber and Ciechanover (1987) have shown that tRNA is essential for conjugation of ubiquitin and for the subsequent degradation of proteins with acidic amino termini. Both bacteria and eukaryotes contain an unusual class of enzymes, aminoacyl-tRNA-protein transferases, which catalyze post-translational conjugation of specific amino-acid residues to the mature amino termini of acceptor proteins. The best studied enzyme so far is the arginyl tRNA-protein transferase (Ferber and Ciechanover, 1987), which transfers an arginine to the amino terminus of proteins that are destined for proteolysis. The degradation process can be inhibited by the addition of either RNase A or micrococcal nuclease and can be entirely resurrected by the subsequent addition of the aminoacylated tRNA^rg after removal of the nucleases, suggesting that the tRNA species is critical in the proteolysis pathway. More recently, another possible tRNA- and ATP-dependent histidylation of substrates with acidic amino termini is being investigated (Ciechanover et al., 1985: Ferber and Ciechanover, 1987). Modification of proteins by lysine and leucine has also been reported (Shyne-Athwal et al., 1986), although its relevance to proteolysis by the ubiquitin system is as yet unclear. These discoveries have kindled another interesting question. Are only certain isoacceptors of a tRNA species relegated for this special function, or are all isoacceptors equally accessible to the ubiquitin-dependent proteolysis pathway? 2. Primers for Reverse Transcription of Retrotransposons Molecular analyses of several mobile elements identified in the Drosophila genome have shown that their genomes contain sequences that would encode putative enzymes similar in amino acid sequence to the retroviral reverse transcriptases. One class of these transposable elements known as copiah&s been extensively examined. Cbpia- related virus 37 particles with reverse transcriptase-like enzyme activity have been identified in Drosophila cells (Shibaand Saigo, 1983). Furthermore, Flavell (1984) has found linear and circular extra-chromosomal copia sequences that can be attributed to reverse transcription. His conclusions have been elaborated by Arkhipove et al., (1984) who detected genome-sized RNA-DNA complexes that are presumably intermediates in the reverse transcription of two Drosophila retrotransposons, mdgl and mdgj Proper initiation of retroviral reverse transcription requires a particular species of host cell tRNA as a primer, which can bind to the viral genome RNA via 18 bp Watson-Crick pairs (Varmus, 1983). Such primer tRNA has been shown to specifically interact with retroviral reverse transcriptase. The high affinity of the enzyme to the exposed surfaces of the L-shaped tRNA (the stem structures) may be related to the enzymes ability to open the acceptor stem, allowing the denatured stem to bind to retroviral primer binding site (Garret et at., 1984). Using synthetic oligonucleotides as probes, Inouye et. a/,0986) have isolated three potential tRNA primer coding sequences for the retrotransposn, 297 Sequence analysis showed that they are related to tRNA^er an<j tRNA7$er genes. They have also isolated both tRNAs and showed that the tRNA7$er contains the predicted 18 nucleotides (including the CCA-fjH e Q d) from the 3'-end exactly complementary to the putative primer binding site of 297, while tRNA4$er differs by one nucleotide. Thus, the authors proposed that tRNA4S e r and/or tRNA7S e r can probably act as primers for this class of retrotransposons. Similar homologies between specific tRNAs and potential primer binding sites have also been correlated for other retrotransposons such as 17.6 (tRNA4^er/tRNA7^er), 412 and /n^/(tRNA A r c ) , zrafc?(tRNALeu7tRNAIle) and gypsy{i$&&YS) (reviewed by Saigo, 1986). 3. Chlorophyll Biosynthesis A molecule of chlorophyll is synthesized from a series of intermediates that are light regulated. One such intermediate step in the pathway is the conversion of glutamate to a penultimate compound known as 6-aminolevulinate or DALA (Astrid Schon et al, 1986). 38 The components performing this conversion have been isolated from barley and Chlamydomonas and can be separated by serial chromatography. One of the components is extremely sensitive to ribonucleases and has been shown by direct nucleotide sequencing to be a chloroplast glutamate tRNA isoacceptor (denoted as tRNA^ALA) which is encoded by the chloroplast genome. Glutamate attached by an aminoacyl bond to the CCA-OH end of the tRNA is the essential substrate for the subsequent steps in the biosynthetic pathway. The remaining two glutamate tRNA isoacceptors have also been purified from barley chloroplast and examined for possible activities in the 6-aminolevulinate conversion reaction. Both showed negative results, even though both species can be efficiently charged by the aminoacyl-tRNA synthetases present in the preparation. These results strongly suggest that the tRNA^ ALA j s a highly specialized glutamate tRNA isoacceptor, probably adapted specifically in chlorophyll biosynthesis. However, its distinguishing features that mark this isoacceptor from the other two have not been well characterized, except that tRNA^ALA appears to be hypermodified in the anticodon. 4. Induced and Naturally Occurring Suppressor tRNAs Termination codons, or stop codons, are TJAA (ochre), UAG (amber), and UGA (opal). These codons normally function to signal cessation of protein synthesis and release of the growing polypeptide from tRNA. Occasionally, mutations can occur within the reading frame converting a "sense" codon into a stop codon. The consequence of this mutation is premature termination, resulting in the production of a truncated protein. Nonetheless, the detrimental effect of the stop codon can be relieved by inducing "suppressor" tRNAs with mutations in the anticodon which can pair with any one of the stop codons and inserting an amino acid substitution in its place. The biology of these mutationally induced suppressor tRNAs in prokaryotes (Murgola, 1985) and in eukaryotes (Korner el al., 1978) have been extensively reviewed, and will not be discussed here. However, there have been reported cases of naturally occurring suppressor tRNAs that appear to constitute part of the normal and functional machinery of the cell (below). 39 Selenium is present in many biological systems in trace amounts, higher levels being toxic (Stadtman, 1974). More than 80% of the element can be traced to proteins containing selenocysteine, an analogue of cysteine in which the sulphur atom has been replaced by an atom of selenium. The question on whether selenium is incorporated into protein during translation or as a post-translational modification, has been a long contention. Previous investigations have identified a specific selenocysteylaminoacyl-tRNA (tRNASec), which suggests that the modified amino acid is directly incorporated into the protein during translation (Hawk.es et, al., 1985). This controversy appears to be resolved by recent report on the cloning of two genes coding for selenocysteine-containing proteins: a mammalian glutathione peroxidase (Chambers et at., 1986) and the £. coli formate dehydrogenase (Zinoni et al., 1986). DNA sequencing of both genes showed that the triplet corresponding to the selenocysteine position in the protein is UGA, which is usually recognized as a termination codon. Thus, the use of UGA for encoding selenocysteine seems to apply for both eukaryotes and prokaryotes, and tRNA^ec would function analogously to a suppressor tRNA. However, how the cell can distinguish this supposed "nonsense" codon from its natural usage as the termination codon remains unknown. Two naturally occurring opal suppressor serine tRNAs have been identifed in mammalian, avian and lenopus tissues (Hatfield et al., 1982; Diamond et al., 1981). In all cases, they represent about 1-3% of the total seryl-tRNA in these tissues. These natural suppressors are 90 nucleotides in length and are thus the longest tRNAs sequenced to date (Diamond et al., 1981; Hatfield et al, 1982). Those from the the bovine liver have been sequenced and are >90% in homology (5 differences) and their anticodons are CmCA and NCA (N is probably a modified U). The most unusual aspect is that in all higher eukaryotic genomes studied, there is only one coding sequence detectable. Since the two tRNAs differ by several pyrimidine transitions, including one in the wobble position of their anticodon as mentioned above, the implication is that these transitions must occur post-transcriptionally (Hatfield et al., 1982; O'Neill et. al., 1985; Pratt et al., 1985; Lee et. al., 1987). Moreover, it was anticipated that at least the anticodon, CmCA, should recognize the 40 tryptophan codon UG6 as predicted by simple Watson-Crick base pairing; instead, both isoacceptors have been shown to recognize UAG in ribosome binding assays and confirmed in in vitro protein synthesis experiments. These isoacceptors are distinguished by their unique characteristic of forming phosphoseryl-tRNA in the presence of a kinase from bovine mammary (Hatfield et at., 1982) and liver tissues (Mizutani and Hashimoto, 1984). The enzymes have been partially purified and appear to consist of at least two different components. Moreover, these enzymes specifically phosphorylate only these two seryl-tRNAs, and no other serine tRNA isoacceptors (Mizutani and Hashimoto, 1984; Sharp and Stewart, 1977). The unique property of the opal suppressor tRNA^er to form phosphoseryl-tRNA may indicate some special role in cellular events requiring suppression. As with the tRNA^c discussed above, they may translate only UGA codons that have the appropriate neighboring sequence context and dictate the insertion of phosphoserine directly into protein (Hatfield, 1985). The Present Studies The present investigation deals with characterization of a group of genes coding for the major serine tRNA isoacceptors, tRNA4$er and tRNA7^er , which are localized to the polytene bands at 12DE on the X chromosome. From previous in situ hybridization, using highly purified tRNAs as probes, Hayashi et al. (1980) showed that this X-linked region constitutes the major hybridization site. Minor hybridization has also been detected at three other autosomal loci: at 23E on the left arm of chromosome 2 (2L), 56D on 2R and 64D on 3L. From RNA sequencing of tRNA4S e r and tRNA7S e r. Cribbs (1982) demonstrated that these two different tRNA isoacceptors are highly homologous in sequence, having only three differences in 86 nucleotides: the Ci6,134 (inosine) and A77 in tRNA7$er are replaced by Di6 (dihydrouridine), C34and G77, respectively, in tRNA4S e r (Cribbs, 1982). Note that the numbering system for the tRNA47^er here does not follow the convention in Sprinzl et al., (1987), where the 77^ nucleotide should actually be number 68. I have used the alternative system, which is the actual nucleotide in the molecules, to maintain consistency 41 in the numbering between the tRNA and the corresponding gene. This numbering system was first adopted by Cribbs (1982), and thus for historical reasons, it is also preferred to maintain consisteny between his and this report. The cloverleaf structure of the tRNA7$er is shown in fig. 6; its differences from tRNA4S e r are indicated accordingly. Because they are highly similar in sequence, they are indistinguishable by hybridization; thus, ia situ hybridization does not convey the actual distribution pattern, whether both gene types are located at all four cytological sites or if they are segregated at different sites. The nucleotide at position 34 is in the anticodon and accounts for the two isoacceptors' different codon recognition. tRNA4^er is TJCG-specific while tRNA7 S e r can read codons UCA, UCC and UCU (White et al., 1975; Cribbs et at., 1987a). Since the two tRNAs recognize non-overlapping sets of codons, they are thus functionally distinct. Five recombinant plasmids hybridizing to Drosophila melanogaster tRNA47^er have been recovered by Dunn et al., (1979b). Sequences corresponding to their putative genes have been obtained (Cribbs, 1982; Newton, 1984). Since the coding sequences corresponding to either tRNA4$er and tRNA7$er are expected to differ at the three nucleotides, for convenience (and for describing "hybrid" genes later, see below) Cribbs (1982) has designated them as either 444 or 777 genes, based solely on the three diagnostic differences. The major impetus to my present investigations stems from the molecular analysis on the coding sequences for the tRNA4S e r and tRNA7S e r (Cribbs, 1982; Newton, 1984). Four of the plasmids pDtl7R, pDtl6R, pDt73 and pDt27R are all derived from the major X-linked site at 12DE; the other, denoted as pDt5, hybridizes to the 23E site on chromosome 2 (Hayashi et. al., 1980). DNA sequence analysis showed that both pDt5 (Newton, 1984) and pDtl7R (Cribbs, 1982) contain a single 777 gene; that is, corresponding to tRNA7^er. pDt27R contains two 444 genes matching the predicted sequence of tRNA4$er (Newton, 1984). In this thesis, I have referred to these genes, with known corresponding tRNA products, as "bona fide" genes. pDtl6R contains two genes, one corresponds to an expected 777, the other a 774. The latter gene is a "hybrid" structure with positions 16 and 34 allied with tRNA7$er but the last 42 AOH ~ 8 5 C G i-pG-C A-U Gm" V ~ • 5 - U - A -*G C ~ G 7 0 D K G A » U°" CAUCC U A m'A. C GUAGG c ^ G m ac4CCG 5 j l|) G GGC o GAj? D A m?G ' , GQ D A 2 ^mUp / U i U-A u C r " 3 c -so 2 0 C-G C y m U so U-A so- G-C m 3C A U i6A IGA. C T R N A S 7 R ( 4 ) F i g . 6. Sequence of tRNA7S e r. The three nucleotides that distinguish t R A N ^ 1 * from tRNA7 S e r are replaced as shown at positions 16. 34 and 77. [from Cribbs (1982).) 43 nucleotide at position 77 is characteristic of a tRNA4^er. pDt73 contains a 474 gene, which has an anticodon of a tRNA7$er, but the other two nucleotides are diagnostic of tRNA4Ser. Hence, the entire caste of genes appears to form a graded series of intermediate sequences ranging from tRNA4Ser to tRNA7Ser. As pointed out by Cribbs previously (1982), within the same tRNA gene family, members encoding functionally distinct isoacceptors usually show sequence divergence between 10-30% (Sprinzl et. al., 1987). Also the pattern of mutational events in the different members tend to be random; that is, neither the positions nor the types of the nucleotide changes in the genes can be reliably predicted. In contrast, it is unusual that the genes coding for the two functionally distinct serine isoacceptors, tRNA4$er a n ( j tRNA7^ e r, would show such a high degree of homology (96%). Further, it becomes even more striking that the related variant genes are merely permutations of the above, with the nucleotide changes played out with almost absolute predictability. These observations prompted Cribbs (1982) to speculate that the tRNA4$er a n ( j tRNA7$er genes are probably not free to diverge; but their similarity in sequence would allow them to keep in check with one another and to continually evolve as a cohesive unit. The intermediate forms would thus reflect the imperfections in this "checking" process. The driving force for such a maintenance process is not clear but has been suggested by Cribbs (1982) to be non-reciprocal recombination and specifically gene conversion. Such a concept would fit well with concerted evolution of other multigene families (both coding and non-coding), which has been eloquently forged into a unifying theory known as Molecular Drive by Dover (1982). The formulation of this theory is based on the observation that in many multigene families that are prevalent in many different species (for example, tRNA genes), the members exhibit unexpected and substantial sequence homogeneity within a species but not between species. Family homogeneity, or cohesive evolution, could be achieved by several molecular mechanisms. For gene families with their members arranged in tandem arrays, such as rDNA (Coen and Dover, 1983), both gene conversion and standard recombination are thought to be operative to maintain sequence homogeneity of the gene 44 members. However, for multigenes that are irregularly spaced and in random orientation, such as the tRNA genes, standard recombination may cause duplication and deletion in the gene members. Instead, it has been hypothesized that gene conversion may hold hegemony as the predominant force in sequence maintenance. When variations arise in a member of the family, they may become fixed in a population as a consequence of stochastic and directional (biased) transmission of the variation. This concerted pattern for fixation of variations or sequence turnover in a gene family and in a population is defined as Molecular Drive. This is in opposition to the Mendelian-mode of evolution, which is modeled on the premise that mutations are unitary and passive events, and their spread through the gene family relies on the activities of selection and the vagaries of drift. Note that these two activities, in turn, must rely on basic theoretical and empirical assumptions that would allow appropriate allotment of "adaptive" and "non-adaptive" values to each. Whereas, no such assumptions are necessary for Molecular Drive, for this alternative process in multigene evolution cannnot be studied mathematically using traditional ad hoc assumptions. The other major impetus to the present work stems from the phenomenon of dosage compensation associated with the X chromosome in Drosophila Females have two X chromosomes, while males have one; but despite the dosage differential of the X chromosome in the two sexes, most of the X-linked genes exhibit a more nearly equal expression than expected based strictly on the number of the genes present. That is, the normal two-dose female is roughly equivalent to the one-dose male in X-linked gene expression. This equalization or buffering effect was first recognized by Muller in 1932 (reviewed by Stewart and Merriam, 1982) and has been termed dosage compensation. The phenomenon can manifest itself in another way. In mutants of short segmental aneuploid series - rather than in chromosomally wild-type males and females as discussed above - the genes in question can exhibit a dosage effect. In females with a small deletion removing one copy of the gene, the total output of the gene product at that site would only be 50% of that in males, even though both sexes are now hemizygous for this gene. In 45 males with a duplicated copy of a gene, the total output would be twice that compared to normal females (i.e. both sexes have two copies of the X-linked gene). As predicted from this dosage effect rule, it follows that a duplicated female with three copies of a gene would only be 50% more active at that locus when compared to wild-type males, despite a three-fold difference in gene copy number. Hence, in an apparently paradoxical manner, the ability of the X chromosome to maintain equal expression between the sexes is also reflected in their differential escape from the dosage compensation mechanisms in the segmental aneuploids. To test whether tRNA4$er and tRNA7$er genes follow the rules of dosage compensation, Birchler et. al., (1982) analyzed the dosage effects of the genes in genetic crosses using X:Y translocations (Stewart and Merriam, 1974 and 1975) that result in progeny with one, two, or three doses of the 12A-13A region in females and one or two doses in males. If the locus responds to compensation, then the level of gene product would be expected to be directly proportional to the dosage of the short chromosomal segment in each sex, but the expression in males would be approximately twice as great per copy. Although their results are complicated by the presence of other minor tRNA^Ser sit©s on the autosomes, they do suggest that the X-Iinked tRNA4$er genes are compensated, but interestingly, not the tRNAySer genes. Ideally, both phenomena of concerted evolution and dosage compensation of the tRNA47^er genes should be investigated by using a unique, or at least a distinguishable marker that can be easily followed. Indeed, this fact has been exploited in the extensive genetic and molecular analyses of gene conversion of suppressor tRNA^er genes in S. pombe (Munz et al., 1981). However, a parallel approach in Drosophila would be more difficult since the ground work on suppressor tRNAs is virtually non-existent. Furthermore, genetic screens for convertants of tRNA genes is relatively simple in yeast; both lethality and phenotypes based on spore colours can be engineered to assist in the recovery of convertants. While this may be possible in theory, a similar approach with the Drosophila tRNA47Ser genes may be a much more difficult and laborious task in practice. 46 As an alternative, I have elected to molecularly walk the 12DE region in order to analyze all members within this gene cluster (Chapter I). An immediate benefit from this expansive cloning study would be a clearer identification of the number and gene types encoded for by this region, which has not been resolved by in situ hybridization. DNA sequences of these genes, in conjuction with the autosomal copies of tRNA^Ser genes in progress by Dr. D. A. R. Sinclair, should at least provide some idea on whether the hybrid genes are likely to be reciprocal or non-reciprocal products. I have also attempted to address the possible origins of these hybrid genes in three ways. The first (Chapter II, Part I) is to identify "sequence signatures" in the flanking regions of the hybrid genes that may delineate their possible relationship with the rest of the tRNA47Ser genes at 12DE and with those on the autosomes where possible. The second is to analyze strain differences in representative tRNA4jSer genes. It was reasoned that if different permutations of the hybrid genes can be identified at homologous loci, then this would attest to the dynamics of sequence turnover as intimated by Cribbs (1982), and would provide convincing evidence for interactions between the tRNA4,7Ser genes. The third (Chapter II, part II) is a conjoined study of other Drosophila sibling species that have diverged from melanogaster for various increments of time. This last approach should delimit the approximate times of origin of the hybrid genes. The cross species approach should also provide a deeper insight into their mode of evolution. If in fact the tRNA$er genes are undergoing a cohesive mode of evolution, then the prediction based on the theory of Molecular Drive, would be that gene members or their surrounding sequences within a species should show more sequence similarities than between species. Furthermore, for co-evolution of irregularly spaced multigenes, conversion has been invoked as the predominant mode of transmission of genetic variations. In the current molecular models of conversion, heteroduplex formation has been espoused as the key intermediate step in the process (see Discussion). If the hybrid tRNA47^er genes are formed as a consequence of conversion, this would suggest that DNA slippage and mispairing between the tRNA4$er and tRNA 7$ e r genes would be required as the initiating events. Such an occurrence, slip-sliding of DNA, as a distinct 47 possibility has been hinted from the previous studies on a tRNAArg cluster located 600 bp downstream from the tRNA4S e r genes within pDt27R mentioned previously (Newton, 1984). The tRNAArg genes are arranged as four tandemly duplicated units, including large amounts of flanking sequences. Each duplicated unit is demarcated on either side by an eight base pair direct repeat, TAGCCCAA. This duplication pattern conveys the impression that they are likely to be formed by unequal crossing-over near the short direct repeats, Thus, in Chapter III, I have used the tRNAArg genes as independent "markers" to test whether DNA slippage can account for both gene duplication and gene conversion observed at 12DE by examining the organization of tRNAArg genes in distantly related melanogaster sibling species. The general organizational pattern imparted by the tRNA^jSer genes at 12DE may suggest why the tRNA4S e r genes are dosage compensated, while the tRNA7S e r genes are not despite their close proximity. One possible scenario could be that the two gene types are segregated at this chromosome site permitting some form of differential regulation at the level of dosage compensation. Alternatively, the supposed inability of the tRNA7$er genes to dosage compensate could be an artifact stemming from insufficient sensitivity in the assays employed by Birchler et al., (1982). In Chapter V, I have also attempted to examine this problem by correlating the presence of repetitive sequences surrounding the tRNA$er genes and the promoter region of white, to search for potential candidates involved in dosage compensation. Chapter IV is a tangential excursion into the sequence organization of tRNA3bVa* at the chromosomal bands 90BC, as part of a comprehensive analysis of the in vitro and in vivo expression of these genes (Dunn et al., 1979a; Larsen et al., 1982). METHODS AND M A T E R I A L S 48 REAGENTS Enzymes Used in Molecular Cloning Restriction endonucleases were purchased from Bethesda Research Laboratories (BRL), New England BioLabs (NEBL), Boehringer Mannheim Canada (BMC) and P-L Biochemicals (P-L). Other enzymes were purchased from the following sources: Enzvme Klenow enzyme Polynucleotide Kinase £, coli DNA polymerase I SI nuclease DNasel Calf Intestinal Phosphatase Ribonuclease A Proteinase K Lysozyme T4 DNA Ligase T4 RNA Ligase Suppliers Promega, BMC, PL, BRL P-L, BMC, Dr. D. L. Cribbs P-L, BMC BRL BMC BMC Sigma BRL, BMC Sigma P-L, BRL BRL, Dr. D. L. Cribbs Oligonucleotides All oligonucleotides used in this work are listed in Table I Oligonucleotides synthesized by T. Atkinson (UBC) were supplied as a crude powder and were purified before use. To effect this, they were dissolved in 100 ul of distilled sterile water and an aliquot of 1-2 A260 units of the crude material (10-20 ul) was made 50% in formamide. The mixture was heated at 90 °C for 3 minutes and immediately applied to a 20% polyacrylamide denaturing gel (1% bis-acrylamide, 45 mM Tris-Cl pH 8.3, 45 mM boric acid, 1 mM EDTA, 8.4 M urea). Electrophoresis was carried out at 1,500 volts for about 3 hours and the bands were visualized by shadowing over fluorescent silica gel plates under UV illumination. 49 T A B L E I -LIST OF OLIGONUCLEOTIDES N a m e S e q u e n c e Supplier GT6 5 -GCAGTCGTGGCCGA-3' T. Atkinson GT7 5 -CGCTCCCAGAGGGAATCTG-3' T. Atkinson Arg5' 5'-ATCCATTAGGCCACACGG-3' T. Atkinson* Arg3' 5 -CGAGTCCTGTCACGGTCG-3' T. Atkinson* Fl 5-GTAAAACGACGGCCAGT-3' T. Atkinson* RI 5-CAGGAAACAGCTATGAC-3' T. Atkinson* Pex 5 -CCCAGTCACGACGTT-3' P-L Biochemicats * Purified by C. H. Newton * Also purchased from P-L Biochemicals 50 The band corresponding to the full length oligonucleotide was excised with a scalpel and the oligonucleotide was eluted overnight at 37 °C in 1 ml of 0.5 M ammonium acetate, 10 mM MgCl2 The supernatant was passed through CIS SEP-PAK and the column was washed with successive one ml volumes of 60% methanol. The eluate containing the oligonucleotide was evaporated to dryness with a Savant Speed Vac Concentrator. The oligonucleotide pellet was redissolved in 50 - 100 ul of TE and its concentration determined by UV absorbance at 260nm. Nucleotides Deoxyribonucleoside triphosphates and 2,3-dideoxyribonucleoside triphoshates were purchased from P-L Biochemicals. The nucleotides were dissolved in TE to approximate concentrations of 10 mM. The exact concentrations were determined spectrophotometrically. [c<32p]-deoXyribonucleoside triphosphates and Iy32p]-ATP were purchased from Amersham. They were supplied in solutions as triethylammonium salt with specific activities of "3000 Ci/mmol. Cytidine 3-monophosphate, containing an equal amount of cytidine 2-monophosphate contaminant, was purchased from Sigma. Phenol Liquified phenol was purchased as an 88% aqueous solution from Mallinckrodt and purified by distillation by C. H. Newton. Aliquots were stored at -20 °C in the dark. When required, they were thawed at 65 °C; 8-hydroxyquinoline and 6-mercaptoethanol were added to final concentrations of 0.1% (w/v) and 0.2% (v/v), respectively and stored in the dark at 4 °C. Just prior to use, a small volume was transferred to a glass test tube and extracted several times with 1 M Tris-Cl pH 8.0 and used for periods of up to one week. Formamide Analytical grade formamide was purchased from BDH Chemicals and deionized by stirring with Bio-Rad mixed bed resin AG501-X8(D) (15 g/100 ml) overnight. The resin was dried in vacuo overnight before use. After deionization, the resin was removed by filtration through Whatman glass microfiber filters (934-4H) and the deionized formamide was stored in small aliquots in the dark at -20 °C. 31 Acrylamide Acrylamide (Eastman Kodak) was stored at 4 °C in brown bottles as a 40% aqueous solution. Just before use, bis-acrylamide (Eastman Kodak) was added as required. The acrylamide:bis-acrylamide solution was deionized overnight as described (see Formamide above) and filtered through Whatman 3MM paper. Agarose Agarose (ultra PURE grade) used for most analytical gels was purchased from BRL. For isolation of DNA from preparative gels, occasionally low melting agarose purchased from Bio-Rad Laboratories was used. Galactosides Isopropyl-B-D-thiogalactoside (IPTG)was purchased from BRL. It was dissolved in distilled water to a final concentration of 100 mM and stored at -20 °C. 5-Bromo-4-chloro-3-indolyl B-D-galactoside (X-gal) was purchased from Sigma. It was used as a 2% solution in dimethylformamide and stored at-20 °C in the dark. Autoradiography Curix RP1 X-ray film (Gaevert) and Dupont Cronex Lightning-Plus intensifying screens used for autoradiography were purchased from local suppliers. Supplies For Culture Media Bacto-tryptone, Bacto-yeast extract, Bacto-agar were purchased from Difco. Type-A hydrolysate of casein (NZ amine) was purchased from Humko Sheffield Chemical (division of Kraft). Soy flour and live yeast for Drosophila cultures were purchased from local stores. BACTERIAI. STRAINS The following strains of £ col/were used as hosts for recombinant DNA molecules: LE392 F", A»/R5l4(r"k, m"k), supUA, sup?)*, lac\\ or J(lac IZY)6, gal K2 gatYll, meftA, trpJ05,JC, is a derivative of the £ coli strain ED8654 (Borck etal., 1976; Murray etat, 1977). RR1 F". hsdS20, ara-ii.prokl. lacYX.gaKl. rpsUO, jryl-5. mtl-\, sup E44,fc"(Bolivar etal.1977). JM101 j(lac, pro), svpV, thi, strk shcW, eadk, hspRAJ truVtf, prokB, lad, LaclMVj (Messing et al.. 1981). JM103 <J(lac, pro), thi, strk, supl, endk, sbcb, hsdR', F /raD36, prokB, lacl% ZAM15 (Messing et at, 1981). Q358 hsdR ~k, hsdM\ sttpZ, *80 r (Karn et al, 1980). Q359 hsdR \ , hsdU \ , supl, <t«0 r , P2 (Karn el al, 1980). DH1 F", reckl, endk\, gyrk<X>, thi-\,hsdR\l (r\, m\), supZM,%T (D. Hanahan, 1983). DH5a F-, reckl, endkl, gyrk96, thi-l, hsdRXl (r\, m\), svpZAA, K~ ret Al,4*0<//arZAM15 (D. Hanahan, 1985). NS428 N205ai4amll, bl, redl, clts857, Jam7) (Sternberg et. at, 1977). NS433 N205(Xiam4, bl, redl , rlts857, Jam7) (Sternberg et. al, 1977). JC8111 recbZl. recQl, sbcBYi, recYXAl (Horiiand Clark, 1973). SF8 hsdR'' hsdW recBC. lop-11 (ligase overproducer) .«ypE44 (si/2*). gal-96, SmR, team, / A / ' - K B D , thr (Davis et at,, 1980). SMR10 £ c o l i C-la (XcosZ, db, redl, xisl, gam&m2\0, rlts857, a in'), sam 7)/l (Rosenberg et al, 1985). CULTURE MEDIA AND CONDITION The following media were used for growth of E coli. LB 1.0% Bacto-tryptone, 0.5% Bacto-yeast extract, 0.5% NaCl YT 0.8% Bacto-tryptone, 0.5% Bacto-yeast extract, 0.5% NaCl 2x YT 1.6% Bacto-tryptone, 0.5% Bacto-yeast extract. 0.5% NaCl (Sanger et. al, 1980) LB-glucose 1.0% Bacto-tryptone, 0.5% Bacto-yeast extract, 0.5% NaCl. 1% glucose 33 TB ("Terrific Broth") 1.2% Bacto-tryptone, 2.4% Bacto-yeast extract, 4% glycerol, 17 mM KH2PQ4,72 mM K2HPO4 (Tartof and Hobbs, 1987). M9 Salts 50 mM Na2HPQ4,25 mM KH2PO4,8.5 mM NaCl, 20 mM NH4CI, 1 mMMgS04,0.1 mM CaCl2,10 mM glucose, 0.001% thiamine (Miller. 1972). SOB 2% Bacto-tryptone, 0.5% Bacto-yeast extract, 10 mM NaCl, 2 5 mM KC1,10 mM MgCl2,10 mM MgS04 (Han ah an, 1983). SOC 2% Bacto-tryptone, 0.5% Bacto-yeast extract, 10 mM NaCl, 2.5 mM EC1,10 mM MgCl2,10 mM MgS04,20 mM glucose (Hanahan, 1983). NZYM 1% NZ-amine, 0.5% Bacto-yeast extract, 0.5% NaCl, 10 mM MgCl2 (Leder et al., 1977). LKB 1 % Bacto-tryptone, 0.5% Bacto-yeast extract, 1 % NaCl, 4 mM NaOH (Rosenberg et al., 1985). For plates, Bacto-agar was added to the liquid medium to a final concentration of 15 g/1- For top agar overlays. 7.5 g/1 of Bacto-agar was used. In experiments where plaque lifts were anticipated, agarose was used in the top overlays instead. Strains harbouring plasmids were grown in media containing 25 ug/ml to 100 ug/ml of ampicillin depending on the health of the host. To screen for £ coll hosts (JM101, JM103 and DH5ot) harbouring vectors (M13, pUC and pEMBL) exhibiting the a-complementation phenotype at the lad locus, 50 ul of a 2% X-gal and 10 ul of a 100 mM IPTG solution were either added to the soft overlay before plating cells (M13 transformants) or applied evenly onto the surface of the plates with the aid of a bent glass rod before spreading cells (pUC and pEMBL transformants). It was fortuitously observed that at least two other £ coli hosts (JC8111 and SF8) could also be screened by a "pseudo-a-complementation phenotype". If the colonies of these strains were kept small (<0.2 mm), those harbouring recombinant pUC or PEMBL plasmids containing inserts remained pale-green in colour on X-gal selection plates for a few hours longer than those harbouring wild-type 54 vectors (dark blue-green). If these small colonies were stored at 4 °C, this "pseudo-ct-complementation" phenotype can be prolonged and be reliably applied as a selection scheme to these, and also possibly to other as yet untested £ coli strains. For routine experiments, cells were usually grown at a temperature of 37 °C. Hosts used for plating Jl phages the next day were usually cultured at 30 °C overnight with moderate shaking to prevent the cells from overgrowing. £ coli strains carry the temperature sensitive mutation clts857 (NS428, NS433 and SMR10) were propagated at temperatures at or below 32 °C. Growth was monitored by measuring A550 using a Cary 210 spectrophotometer. Fruitflies Wild-type Drosophila melanogaster isogenic for all the major chromosomes was constructed by Dr. G. M. Tener (UBC). The D, melanogaster mutant bearing deletion from 12A-12E on the X-chromosome, DftDgifB/IntDAM, was obtained from Dr. D. A. R. Sinclair (UBC). The Drosophila sibling species D. erecta, D. yakuba, D. teissieri and D. mauritiana were obtained from the Pasadena Drosophila stock center (Pasadena, California). D. simulans was obtained from Dr. T. A. Grigliatti (UBC). Fruitflies were cultured on Drosophilasoy food containing the following ingredients in one litre of tap water (Dr. G. M. Tener, unpublished): 100 g full fat soy flour, 20 g yeast extract. 17 g agar, 1 g citric acid. 9 g trisodium citrate, 40 g glucose, 40 g sucrose. 15 ml of 10% methyl p-hydroxy benzoate in 95% ethanol. 20 mg streptomycin and 10 mg tetracycline. Prior to DNA isolation, adult flies were cultured under non-crowded condition and transferred to fresh food every 3-4 days. Live yeast was frequently seeded on the surface of the medium to increase the fecundity of the flies. MASS COLLECTION OF EMBRYOS FROM D melanogaster For mass isolation of Drosophila embryos, flies were cultured at 25 °C in standard cages under high humidity and constant 12 hr light/dark cycles. Weigh boats containing 2% agar 55 with a thin layer of yeast paste on top were placed inside the cages as collecting vessels. Embryos were collected every 12 hr by flushing the yeast paste into a small metal screen under luke-warm tap water. The retained embryos were rinsed free of debris and dechorionated in 50% bleach for 3 minutes, followed by several quick rinses under running tap water. The dechorionated embryos were transferred and stored at -70 °C in 1.5 ml Eppendorf polypropylene tubes. PLASMIDS AND BACTERIOPHAGE VECTORS The following list of vectors were used routinely in cloning; Ml 3 vectors mp8 (Messing. 1983) mp9 mplO mpll mpl8 pEMBL vectors (Dente et al. 1985) 8-pUCB (Messing, 1983) cosmid vectors PJB8 (Ish-Horowicz and Burke, 1981) cosPneo (Steller and Pirrotta, 1985) X vectors EMBL3 EMBL4 (Frischauf et. at., 1983) (Frischauf et. al., 1983) 56 2001 (Karn et, al., 1984) INTRODUCTION OF PLASMID AND DOUBLE-STRANDED BACTERIOPHAGE M13 DNA INTO Escherichia coli Reagents Routine transformation of £ colivras performed using a solution of 50 mM CaCl2 For £ coli strain RR1J00 mM CaCl2 salt solution was required (Dagert and Ehrlich, 1979). Both solutions were sterilized by autoclaving. For preparation of frozen competent cells, the modified reagents derived from Hanahan (1983 and 1985) were used. The salt solutions were made into individual 10 x stocks and sterilized by autoclaving. Hexamine cobalt (III) chloride was sterilized by filtration. FB 100 mM KC1.50 mM CaCl2,15% glycerol (v/v), 10 mM potassium acetate, adjusted to final pH 6.2 (Hanahan. 1985). MHB 45 mM MnCl2,10 mM CaCl2,100 mM RbCl2,3 mM hexamine cobalt (III) chloride, 10% (v/v) glycerol (modified from Hanahan, 1983; M. Fettes, personal communication). BACTERIAL TRANSFORMATION For standard bacterial transformation, competent cells of £ coliwere prepared using the CaCl2 mediated method described by Dagert and Ehrlich (1979). Cells were usually starved in CaCl2 for 12-16 hours at 4 °C before use to enhance their competence. High efficiency competent cells of the £ coli strain DH5a were prepared using protocol I of "Frozen Storage of Competent Cells" described by Hanahan (1985). Ten colonies picked from an overnight SOB plate were used to inoculate 100 ml of SOB. The culture was grown to an A550 of 0.5 to 0.7 at 37 °C with good aeration and rapidly chilled by swirling on ice for 5 minutes. The cells were harvested by centrifugation at 2,000 rpm for 15 minutes at 4 °C in a 57 clinical centrifuge. The pellet was resuspended in 1/3 volume of ice-cold FB and incubated on ice for 30 minutes. The ceils were pelleted again as described and then gently dispersed in 1/12.5 volume of FB. Aliquots of 200 ul of the competent cells were dispensed into pre-chilled 1.5 ml Eppendorf polypropylene tubes. The cells were quick-frozen in an ethanol/dry-ice bath and stored at -70 °C until needed. Competent cells of the E <w//strain JC8111 were prepared as described by Hanahan (1983) with minor modifications (M. Fettes, personal communications). Several colonies from a fresh plate were inoculated in 100 ml of 2x YT supplemented with 20 mM MgCl2 and grown to an A550 of 0.7-0.8. The cells were chilled on ice for 15 minutes and pelleted by centrifugation at 3000 rpm for 5 minutes in an SS34 rotor. The pellet was resuspended in 1/3 volume of MHB and stored on ice for 30 minutes. The cells were pelleted again as described and resuspended in 1/12.5 volumes of MHB. Two aliquots of DMSO (280 ul total) were added 10 minute apart with cells kept chilled on ice. After a further 5 minute incubation, 200 ul aliquots of the competent cells were flash frozen as described. Frozen competent cells of strain DH5a or JC8111 were thawed at room temperature just prior to use. Plasmid DNA (< 20 ul) was added and the cells were stored on ice for 10 minutes. They were heat shocked at 42 °C in a heating block for two minutes (or 37 °C for 5 minutes for the weaker strain JC8111) and then cooled rapidly on ice for one minute. SOC was added to the cells to a final volume of one ml and incubated for 30 minutes in a 37 °C water-bath with occasional agitation by tube inversion. Aliquots of 10 ul to 100 pi of the transformed cells were plated on appropriate selective and indicator media. When the bacteriophage M13 was used as the cloning vector, transformation was performed as described above using the E coli strains JM101 or JM103 made competent by the CaCl2 method. Except after heat shock at 42 °C, 3 to 4 ml of soft agar overlay containing X-gal and IPTG were added to the ceils. The content was quickly poured onto plates that had been pre-warmed at 37 °C. 58 ISOLATION OF PLASMID AND DOUBLE-STRANDED Ml 3 DNA Two methods were used to isolate supercoiled plasmid and phage DNA from £ coli The first method uses Triton X-100 to gently lyse cells (Davis et al., 1980) and was adopted exclusively for isolating DNA from large cultures (1 to 2 litres). The second method (Birnboim and Doly, 1979; Maniatis et al., 1982) employs SDS and alkali to lyse the cells, and has been generally used for rapid isolation of DNA from "mini-preps" of 1 to 2 ml cultures. A scaled up version of this method has also been applied successfully in isolating DNA from large cultures. Both the Triton and alkaline lysis methods were satisfactory, but the latter was more convenient to use and was generally preferred. LARGE SCALE DNA ISOLATION  Plasmid DNA A single ampicillin resistant colony of £ coli was inoculated into 25 ml of LB or YT containing appropriate concentration of ampicillin. The cells were grown overnight at 37 °C with vigorous shaking to ensure good aeration. The cells (10 ml) were inoculated into 1 litre of M9 medium and growth was continued until an A550 of 0.6 was reached. Chloramphenicol was added to a final concentration of 100 mg/ml (dissolved in 5 ml of 95% ethanol) and incubation was continued for 12-16 hours. The cells were harvested at 4 °C by centrifugation in a Sorvall GSA rotor at 6000 rpm for 10 minutes. Double-stranded Ml3 DNA The method employed here is adopted from a procedure by Dr. Mark Zoller (Cold Spring Harbor). A single colony of the £ coli strain JM101 or JM103 was inoculated into 2 ml of M9 salts and incubated overnight at 37 °C. Approximately 200 ul of the overnight culture was inoculated into 5 ml of 2x YT and growth was continued for 2 hr at 37 °C. The cells were diluted 10 fold with 2x YT. A small volume (1-2 ml) of the cells was infected with a single plaque of M13 and incubated at 37 °C for 6 hr. Another aliquot (4-5 ml) of the cells was inoculated into 500 ml of M9 salts and was grown to a cell density of A550=0.7. The small volume of M13 infected cells was then added to the large culture and incubation was continued at 37 °C for 90 minutes. The cells were collected by centrifugation at 4 °C in a GSA rotor at 59 6000 rpm for 15 minutes. They vere lysed by using the Triton X-100 method and the bacteriophage DNA purified by two passages through CsCl gradients. Both of the lysis methods and the DNA purification procedure are described below. Lvsisbv Triton X-100 The cell pellet was resuspended in 2.5 ml of 50 mM Tris-Cl (pH 8.0). 25% sucrose by gently pipetting. EDTA was added to a final concentration of 250 mM followed by 2.5 mg of lysozyme. The cells were mixed by vortexing and then stored at 4 °C for 20 minutes. Lysis was achieved by adding 3 5 ml of 2% Triton X-100 followed by a further 10 minute incubation on ice. The lysate was cleared by centrifugation at 4 °C in a Beckman type-30 rotor at 25,000 rpm for 60 minutes. The lysate was transferred to sterile 30 ml Corex tubes or polypropylene tubes. An equal volume of redistilled 1:1 phenol/chloroform (v/v) was added and agitated by gentle vortexing. The phases were separated by a brief centrifugation in an SS34 rotor for 5 minutes. The phenol extraction procedure was repeated and then followed by two washes with chloroform. The aqueous phase was carefully transferred to a clean Corex tube. Sodium acetate (pH 6.0) was added to a final concentration of 0.3 M followed by 0.6 volume of isopropanol. The DNA was precipitated by centrifugation at 7,000 rpm in an SS34 rotor for 30 minutes at 4 °C. After briefly drying the pellet, it was resuspended in TE (10 mM Tris-Cl. pH 8, 1 mM EDTA). The DNA was treated with ribonuclease A (100 ug/ml) at 37 °C for 30 minutes and was further purified by CsCl equilibrium centrifugation. Lysis by Alkali  Large Scale DNA Preparation A scaled up alkali lysis procedure was used essentially as described by Maniatis et al. (1982) with minor modifications. A single ampicillin resistant colony was inoculated into 5 ml of LB or YT containing 50 ug to 100 ug/ml of ampicillin and grown overnight at 37 °C. A 2.5 ml aliquot of the culture was inoculated into 500 ml of the same medium and allowed to grow with vigorous shaking at 37 °C until the culture is almost saturated (A550=1.0 to 1.5). Cells were harvested by centrifugation at 4 °C in GSA rotor at 5000 rpm for 10 minutes. The pellet was resuspended in 5 ml of 50 mM glucose, 25 mM Tris-Cl (pH 8.0), 10 mM EDTA. Solid lysozyme was 60 added to a final concentration of 5 mg/ml and mixed with the cells by gentle vortexing. After 5 minutes at room temperature or 20 minutes on ice. 10 ml of 0.2 M NaOH. 1% SDS was added by rapid ejection from a pipet. An ice-cold solution of 3 M potassium acetate (7.5 ml) was added and the contents mixed gently by inverting the tube 2-3 times. After 10 minutes, the lysate was cleared by centrifugation in an SS34 rotor at 10,000 rpm for 30 minutes at 4 °C. After this step, the DNA was treated identically as above in preparation for CsCl gradient centrifugation. CsCl Gradient Purification of DNA The DNA pellet was dissolved in TE and solid CsCl (1.13 g/ml) was added. Ethidium bromide was added to the DNA in the dark to a final concentration of 0.6 mg/ml. The contents were transferred to Beckman "quick-seal" tubes with a Pasteur's pipet and sealed with heat sealer. The tubes were centrifuged at 20 °C in a VTi65 rotor at 65,000 rpm for four hours or at 50,000 rpm for 14 hours. Plasmid DNA was identified with the aid of a long wave UV lamp (365 nm) and removed with a 3 cc B-D syringe equipped with a 26 gauge needle. The DNA was extracted several times with equal volumes of water-saturated n-butanol in the dark to removed the ethidium bromide. The CsCl was subsequently removed by dialysis in several changes of 20 mM Tris-Cl (pH 7.4), 1 mM EDTA at 4 °C. Alternatively, two volumes of distilled water was added to the DNA and then precipitated with two volumes of 95% ethanol. The concentration of DNA was determined by absorbance at 260 nm using a Cary-120 spectrophotometer, assuming 1 A260=50ug (Davis et al, 1980). Purification of DNA bv Column Chromatography When absolute purity was not required, plasmids were prepared by column chromatography based on a procedure developed by Dr. Ian Gillam (UBC). The agarose matrix, A-15 m, was equilibrated in 100 mM acetic acid (pH 5-0). 0.02% sodium azide by repeated washing and the slurry was packed into an "upward flow" column (Pharmacia. 90 cm X 2.5 cm). DNA samples were applied as a 5% sucrose solution and cushioned into the column bottom by a "chase" consisting of 100 mM acetic acid pH 5.0, 0.02% sodium azide, 10% sucrose. The DNA was eluted upward at a flow rate of 10 ml/hr for 16 hr. Fractions of 1 mi volume were collected and they were monitored for the presence of plasmid DNA by absorbance at A260. 61 Fractions (usually between 34 and 43) containing plasmid DNA were pooled and concentrated by precipitation with ethanol. This procedure is economical but it suffers from the disadvantage that the plasmids are usually contaminated by trace amounts of chromosomal DNA. In addition, the quality of DNA is less predictable with respect to the amount of nicking compared to the CsCl gradient procedure. Small Scale Mini-Preparation This procedure has been described by Maniatis et al. (1982) and is a modification of the method developed by fiirnboim and Doly (1979), Cells from a 2 ml overnight culture were harvested in a 1 5 ml Eppendorf tube in an Eppendorf micro-centrifuge (as for all subsequent centrifugation steps) and the pellet was resuspended in 100 ul of 50 mM glucose, 25 mM Tris-Cl (pH 8.0), 10 mM EDTA. Lysozyme was omitted as it is unnecessary with most £ colistrains (except perhaps DH1). Two volumes of a freshly prepared solution of 0.2 M NaOH, 1 % SDS were added and the lysate was briefly incubated on ice. A 150 ul volume of potassium acetate (pH 4.8) was added and the contents were gently mixed by inverting the tube several times. The tube was allowed to sit on ice for 15 minutes and then centrifuged at 4 °C at 16,000 rpm. The supernatant was transferred to another clean tube and extracted once with equal volumes of 1:1 phenol xhloroform (v/v) as described. The phases were separated by a five minute centrifugation at room temperature and the aqueous phase was transferred to a clean tube followed by the addition of two volumes of 95% ethanol (room temperature). After a 5 minute incubation, the DNA was precipitated by a 15 minute centrifugation. The pellet was rinsed once with 70% ethanol and dried briefly in vauco. The DNA was resuspended in 50 ul TE and 1-3 ul were used for restriction analysis. ISOLATION OF TEMPLATE DNA FOR SEQUENCING  Double-Stranded DNA Plasmids for sequencing were prepared by the small scale alkali lysis method as described by Birnboim and Doly (1979) and modified according to Birnboim (1983), Pelham (1985) and Hattori and Sakaki (1986). After the bacterial debris was precipitated with the aid of potassium 62 acetate, 1/4 volume of a 10 M LiCl solution was added to the cleared lysate (Pelham, 1985; Birnboim, 1983). After incubating the tube on ice for 15 minutes, the rRNA was precipitated by centrifugation at 4 °C for 15 minutes in an Eppendorf centrifuge (as for all subsequent centrifugation steps). The supernatant was extracted once with an equal volume of phenolxhloroform (v/v) and the DNA precipitated by ethanol as described. After the pellet was resuspended in TE, ribonuclease A was added to a final concentration of 40 ug/ml. The digest was incubated at 37 °C for 30 minutes, then 0.6 volume of 20% polyethylene-glycol (PEG-8000), 2.5 M NaCl was added. The tube was chilled on ice for 30-60 minutes and the DNA was then precipitated by a 5 minute centrifugation at room temperature (Hattori and Sakaki, 1986). The supernatant was removed with a drawn-out Pasteur's pipet and the DNA pellet was rinsed once with 70% ethanol. After drying in vacuo, the pellet was resuspended in 50 ulTE. Single-Stranded DNA Single-stranded DNA templates were prepared from either the bacteriophage Ml 3 (Messing. 1983) or the pEMBL plasmids (Dente et al, 1983). DNA fragments of less than 1.0 kb were usually propagated in M13 vectors for sequencing. Larger fragments, however, were unstable and frequently suffered from deletions. It was this observation that led to the alternative use of pEMBL plasmids for cloning larger DNA fragments (reviewed by Dente et al, 1985). When cells harboring such plasmids are superinfected by the helper bacteriophage IR1, one strand of the plasmid can be packaged and extruded into the medium as virion capsids. Thus in theory, DNA can be stably maintained in double stranded form until the single-stranded template is needed. Unfortunately, colonies stored on plates (LB or M9 salts) at 4 °C overnight can often become erratic in both the efficiency of superinfection and the yield of virions. Bacteriophage M13 A single plaque of Ml3 was picked with a sterile Pasteur's pipet and inoculated into 1.5 ml of 2x YT with moderately shaking at 37 °C to elute the phage. Host cells (either £ tv//strain JM101 or JM103) were freshly prepared by inoculating a single colony into 10-25 ml of YT. When the cell density of A55Q=0.6 was reached, a 20 ul aliquot was added to the eluted M13 63 above. The tube was shaken vigorously for 8-14 hours at 37 °C to ensure good aeration for phage growth. The culture was transferred to a clean 1.5 ml polypropylene tube and centrifuged for 5 minutes in an Eppendorf microcentrifuge (as for all subsequent centrifugation steps). The supernatant containing M13 virions (1.3 ml) was removed and added to 200 ul of 20% polyethylene-glycol (PEG-8000), 2.5 M NaCl. The tube was inverted several times to mix and then incubated at room temperature for 15 minutes. The virions were then precipitated by centrifugation for 5 minutes. The supernatant was removed completely with a flame-drawn Pasteur's pipet and the pellet dispersed in 200 ul of TES (200 mM Tris-Cl, pH 8.0, 200 mM NaCl, 1 mM EDTA). The bacteriophage was extracted with an equal volume of 1:1 phenol:chloroform (v/v) twice and the DNA precipitated by ethanol as described. The template was resuspended in 25-50 ulTE. The oEMBL Plasmids A single ampicillin resistant colony was inoculated into 1.5 ml of 2x YT containing 50-100 ug/ml of ampicillin. The tube was shaken vigorously at 37 °C until a cell density of A550=0.1-0.2 was reached. Assuming that 1.0 A550=7.5X 10 cells/ml, a twenty fold excess of the helper phage IR1 was added to ensure efficient infection. The culture was shaken vigorously for a further 4-6 hours to allow packaging and extrusion of single-stranded DNA as virion capsids. They were collected from the medium by precipitation with 20% PEG-8000, 2.5 M NaCl and the template DNA was prepared as described for Ml3. Since the packaging process is indiscriminate, half of the capsids should theoretically contain the single-stranded pEMBL plasmid (Dotto et. al., 1981) In practice however, this efficiency is often drastically reduced by the size of the insert (>5.0 kb) and the age of the cells before infection (>24 hr). In the latter case, the preparation of fresh transformants appears to be the only viable alternative (this laboratory and Luck, 1986). Preparation of the Helper Phage IR1 A single colony of JM101 was inoculated into 5 ml of LB and incubated at 37 °C with vigorous shaking until A^o=0.2. Approximately 1.5 x 10^  IR1 (obtained from Dr. Andrew Spence) was added to the culture and growth was continued until an A550=0.6 was reached. 64 The culture was then inoculated into 250 ml of LB and allowed to grow at 37 °C with good aeration until saturation. The cells were removed by centrifugation in a GSA rotor at 10,000 rpm for 10 minutes. The supernatant, containing the free virions, was transferred to sterile bottles and stored at -70 °C without glycerol. DNA SEQUENCING Most of the DNA sequences were determined by the chain terminator method as originally described by Sanger et. al. (1977). This method has been generally applied to sequence determination using the single-stranded templates. Recently, this method has also been adopted for use in sequence determination involving double stranded DNA molecules (Chen and Seeburg, 1985; Hattori and Sakaki, 1986). This latter development eliminates the necessity for obtaining inserts cloned into both orientations as in the case for both M13 (Sanger et. al, 1980; Messing, 1983) and pEMBL templates (Dente et. al., 1983). Sequences can now be readily obtained from both strands of the DNA by using both the universal forward and reverse sequencing primers. The different procedures involved in template preparation for single-stranded and double-stranded DNA have been discussed above. While the actual sequencing conditions are almost identical, different treatments are required for annealing the sequencing primer to the two different types of templates. These procedures are discussed below. The chemical degradation method of Maxam and Gilbert (1980) was also used in sequencing DNA fragments providing detailed restriction mapping information was available. However, this method is much more labor intensive and generally require 5 to 10 times the amount of radiolabelled nucleotides to label a unique end of a restriction fragment for sequencing. High density polyacrylamide gels (12-20%) are often necessary to resolve sequences close to the labelled end and therefore, cannot be easily dried down. The wet gels lead to more scattering of the radioactivity resulting in a decrease in band resolution on the X-ray film. Furthermore, even with the aid of intensifier screens, the sequencing gels frequently require a much longer exposure time, particularly with sequences farther from the 6 3 radio labelled end. Chain-Terminator Method  Sincle-Stranded DNA Templates  Ml3 and pEMBL Plasmids About 0,5 ug of the template (5 ul), 0.5 to 1 pmole of sequencing primer (1 ul) and Z ul of 10 x Hia buffer (100 mMTris-Cl pH 7.6, 500 mM NaCl and 50 mM MgCl2; Messing et. al., 1981) were mixed in a 1.5 ml Eppendorf polypropylene tube. The mixture was heated in a 65 °C water bath for 10 minutes and then placed inside a small test tube containing water at 65 °C. The annealing mix was allowed to cool slowly to room temperature for 15-20 minutes, Double-Stranded DNA Templates P U C H and Double-Stranded pEMBL Plasmids Plasmid DNA (1-2 ug) was denatured in 0.2 M NaOH at room temperature for 5-20 minutes. It was then neutralized with the addition of 2.5 M ammonium acetate (pH 7.5) and precipitated with the addition of two volumes of cold 95% ethanol. The pellet was rinsed once with cold 70% ethanol and then dried briefly under vacuum. The denatured DNA template was then annealed with 1 pmole of sequencing primer and 1 pi 10 x Seeburg buffer (70 mM Tris-Cl pH 7.5, 70 mM MgCl2, 50 mM B-mercaptoethanol and 1 mM EDTA; Chen and Seeburg, 1985) in a final volume of 10 ul at 65 °C for 10 minutes and treated identically in subsequent steps as single-stranded DNA templates. Chain Termination Reactions Sincle-Stranded Templates The sequencing reactions were performed as droplets inside a sterile petri plate to facilitate the handling of a large number of templates (Courtesy of Dr. David Goodin). The £ coli Klenow fragment of DNA polymerase I (0.8-1.0 units, about 0.25 ul straight from stock tube), 1.5 ul of [ce32p]dATP and 1 ul of 15 uM solution of dATP were added to the annealed primer-template mix on ice. Aliquots of 2.5 ul were added to 2 ul of pre-distributed mixes of dideoxy-and deoxyribonucleotides (see Table II, top) and the reactions were initiated by incubation at 37 °C. After 15 minutes, 1 ul of a chase solution containing 0.5 mM of all four dNTP was added 6b to each reaction. After a further 15 minute incubation at 37 °C, the reactions were stopped by adding 4 ul of a stop mix (90% formamide, 0.07% bromophenol blue. 0.07% xylene cyanol). The sequencing products were denatured by heating at 90 °C in a water bath for 3 minutes prior to loading onto the sequencing gels. Double-Stranded Templates For sequencing double-stranded templates, at least 2 units of Klenow polymerase enzyme and2ul of [«32p]dATP were used. In addition, the reaction temperature of 42 °C appeared to reduce artifactual bands over long tracts of A/T rich sequences. However, at this higher temperature, 2 ul of a half-diluted chase solution was used to compensate for the increased evaporation of the sequencing droplets. To exploit the high GC nucleotide content within the structural tRNA genes, la32p]dGTP was occasionally used as the radiolabelled nucleotide and custom designed sequencing oligonucleotides targeting internally to the tRNA genes were used. The relative concentrations of the deoxyribonucleotides and dideoxyribonucleotides in the sequencing mixes adjusted for using la32p]dGTP are listed in Table II (bottom). Purification of Radiolabeled Restriction Fragments For Maxam-Gilbert Sequencing The 3' ends of restriction fragments for sequencing were labeled using 1-2 units of the Klenow fragment and 50-80 uCi of the appropriate la32p]dNTP with respect to the restriction enzyme recognition site. The restriction fragments were resolved by gel electrophoresis in a 5% polyacrylamide gel. The appropriate bands were localized by autoradiography and excised from the gel matrix with a scalpel. The DNA fragments were eluted from the gel strip by soaking in 0.6 ml of 500 mM ammonium acetate, 10 mM magnesium acetate, 1 mM EDTA, 0.1% (w/v) SDS and 10 ug/ml £ coli tRNA carrier (Maxam and Gilbert, 1980) at 65 °C overnight in a 1.5 ml Eppendorf tube. The tube was briefly centrifuged and the supernatant was transferred to a clean 1.8 ml Eppendorf tube. The DNA was recovered by precipitation in ethanol. When the restriction fragments were labeled at both of their 3' ends, the two labeled ends were separated by either cutting with a second restriction enzyme or strand separation by heating at 90 °C for 2 minutes in 30% DMSO (v/v), 1 mM EDTA, 0.07% xylene cyanol and 0.07% 67 T A B L E Il-DEOXY-DIDEOXYRIBONUCLEOSIDE TRIPHOSPHATE MIXES FOR CHAIN-TERMINATION SEOUENCING-lg32pldATP as the Radiolabelled Nucleotide G mix A mix Tmix Cmix ddGTP 89 ddATP - 8.0 -ddTTP 13 ddCTP 13 dGTP 1.5 21 30 30 dATP -dTTP 30 21 1.5 30 dCTP 30 21 30 2 This protocol is provided by Dr. Joan McPherson. fg32pl dGTP as the Radiolabelled Nucleotide Gmix A mix Tmix Cmix ddGTP 20 - - -ddATP - 300 -ddTTP - - 750 -ddCTP - - - 200 dGTP . . . . dATP 113 8 160 160 dTTP 113 160 8 160 dCTP 113 160 160 10 The protocol is provided by Lawrence Shitnin. All concentrations in the two protocols are in pM. For longer sequences, the final concentrations of the dideoxyribonuceoside triphosphates in the sequencing mixes were decreased by either 50% or 67%. 68 bromophenol blue, followed by quick chilling in an ethanol-dry ice bath, The separated ends were then resolved again by gel-electrophoresis in a non-denaturing 5% polyacrylamide gel. The resolved products uniquely labeled atone end were recovered as described above. Maxam and Gilbert Sequencing Reactions The sequencing reactions were performed according to Maxam and Gilbert (19S0) with minor modifications by Dr. A. Delaney. The procedure is summarized in Table III. TREATMENT OF GLASSWARE AND PLASTICWARE Glass test tubes, capillaries, Pasteur's pipets and Eppendorf tubes for use in the construction of genomic libraries were all treated with dichlorodimethylsilane as described in Maniatis et. al. (1982). They were then rinsed exhaustively in distilled water, dried and sterilized. Eppendorf tubes and pipet tips for general DNA manipulations were used from unopened packages without sterilization. ISOLATION OF GENOMIC DNA FROM Drosophila Three different methods have been applied to isolate genomic DNA from Drosophila For most purposes, a quick method has been adopted from the procedure by de Cicco and Glover (1983) designed for isolating DNA from a single fly, such as mutants that are difficult to grow. However, it suffers from the limitation of producing DNA that is too small for the construction of genomic libraries, but remains adequate for routine genomic Southern blots. The other two large scale methods (I and II) are more labor intensive but produce DNA that are sufficiently large (>120 kb) for the construction of both lambda or cosmid libraries. Since protocol II of the large scale method is more expedient, even though it gives dirtier DNA, it is still generally preferred. Quick Method A small number of adult flies (1-100) were placed in a 1.8 ml Eppendorf tube and homogenized in 100-200 ul of 10 mM Tris-Cl pH 7.5, 60 mM NaCl, 10 mM EDTA, 0.15 mM spermine, 0.15 mM spermidine and 5% sucrose. An equal volume of 1.25% SDS, 300 mM Tris-Cl T A B L E III-DNA Sequencing Reactions by the Maxam-Gilbert Method [32pj DNA (pi): 5 10 10 5 Carrier DNA: lu i lu i lu i lui (1 mg/ml) 200 ul 10 ul 10 ul 15 ul cacodylate dH20 dH20 5M buffer NaCl Add: lui 3ul 30 ul 30 ul DMS 10% formic acid HZ HZ Incubate: 6',RT 15*. 37 °C 10', RT 10, RT Add: G-stop HZ-stop HZ-stop HZ-stop 50 ul 230 ul 200 ul 200 ul 95% ethanol: 1 ml 1ml 1ml 1ml (-70 °C) Incubate: Microfuge: 0.3 M sodium acetate: 95% ethanol: Reprecipitate Wash pellet and dry in vacuo: Resuspend in: Strand scission: Dessicate: Resuspend in. Repeat dessication -70 °C, 15 minutes 5 minutes 300 ul l m l 2 x in 1 ml 95% ethanol 27 ul dH20,3 ul 10M piperidine 90 °C, 45 minutes overnight over P2O5 to collect piperidine 100 ul dH20 70 pH 9.0,100 mM EDTA and 5% sucrose was added and the homogenate incubated at 65 °C for 30 minutes. A volume of 30-60 ul of an 8 M potassium acetate were added and the mixture was chilled on ice for 45 minutes, then centrifuged at 12,000x g for 10 minutes at 4 °C. The supernatant was transferred to a clean Eppendorf tube, extracted with an equal volume of 1:1 phenolxhloroform (v/v) and two volumes of 95% ethanol were added to precipitate the DNA, The pellet was washed twice with 70% ethanol, briefly dried ia vacuo'and then redissolved in 20-50 ul of TE. Ribonuclease A was added to a final concentration of 10-20 ug/ml and incubated at 37 °C for 30 minutes. Usually 5-10 ul was sufficient for a single restriction digest. Large Scale Method I Method I is a compiled adaptation from Holmgren (1984), Ish-Horowicz (1979), Kidd et a/. (1983), Scott et ai. (1983) and Maniatis et aJ. (1982). Adult fruitflies (1-2 g) were homogenized in 5 ml of 10 mM Tris-Cl pH 7.5, 60 mM NaCl, 10 mM EDTA. 0.15 mM spermine and 0.15 mM spermidine (4 °C in aDounce Homogenizer). The homogenate was filtered through two layers of Nitex screen to remove the large debris and the filtrate was centrifuged for 10 minutes at 4 °C in an SS34 rotor at 4000 rpm. The supernatant was discarded and the pellet dispersed thoroughly in 5 ml of 20 mM Tris-Cl pH 8.0, 100 mM NaCl and 10 mM EDTA. The cells (and nuclei) were lysed by rapid mixing with 1 ml of 10% SDS and then proteinase K was added to a final concentration of 200 ug/ml. The tube was incubated at 37 °C for one hour and then gently extracted three times with 1:1 phenol-chloroform (v/v), followed by two chloroform washes with minimum agitation. The aqueous phase was transferred into a dialysis tubing by pouring and dialyzed extensively against 50 mM Tris-Cl pH 8.0. 10 mM NaCl. 10 mM EDTA until the A270 of the dialysis buffer was less than 0.05 (Maniatis et al, 1982). The DNA was treated with ribonuclease A (100 ug/ml) at 37 °C for 1-3 hours and then with 100 ug/ml proteinase K for a further 60 minutes. The DNA was phenol extracted and dialyzed as described above. It was concentrated by adding 1/5 volume of 2.5 M ammonium acetate, 100 mM MgCl2,1 mM EDTA and 2 volumes of 95% ethanol. The pellet was washed once or twice with 70% ethanol and dried ia vacuo. It was then rehydrated in a small volume of TE (0.5-1.0 ml) without disturbance for 2-4 days at 4 °C. 71 Large Scale Method II High molecular DNA produced by this method modified from McGinnis and Beckendorf (1983) contains trace amounts of eye and cuticle pigments (brown color) and is contaminated with some low molecular weight RNA. However, the impurities appear to be innocuous with respect to restriction digests and ligation efficiencies. About 1 g of adult flies were frozen in liquid nitrogen and homogenized to a fine powder using a mortar and pestle (Blin and Stafford, 1976). It was quickly transferred with a pre-chilled spatula or a small paint brush into 5 ml of solution A (30 mM Tris-Cl pH 8.0, 100 mM NaCl, 10 mM EDTA, 10 mM 2-mercaptoethanol and 0.5% Triton X-100 (w/v)) and vortexed vigorously for about 20 seconds. The debris was pelleted at 4000 rpm at 4 °C in an SS34 rotor for 10 minutes and then washed in 5 ml of solution B (100 mM Tris-Cl pH 8.4, 20 mM EDTA, 100 mMNaCl). The debris was pelleted again and then dispersed in 3 ml of the same buffer. The ceils and nuclei were iysed by rapid ejection of 0.3 ml of 10% SDS and then proteinase K was added to a final concentration of 100 ug/ml. The tube was incubated at 50 °C for 1 hour and then gently extracted with an equal volume of 1:1 phenolxhloroform (v/v) as described. The aqueous phase was transferred to a sterile 30 ml Corex tube with a wide bore pipet and 2 volumes of 95% ethanol (-20 °C) were layered gently on top. The aqueous phase was extracted by rotating the tube gently at a 30° angle on ice until a stringy precipitates formed at the interphase (D. Jones, personal communications). The ethanol was replaced intermittantly with fresh aliquots and the process was repeated until the aqueous phase almost completely disappeared. The matted ball of DNA was retrieved by a siliconized glass hook and washed by repeated dunking in fresh 70% ethanol. Excess ethanol was removed by dabbing the ball of DNA along the inside of a clean Corex tube but with due caution not to dry the DNA completely. The DNA was then dispersed in 0.5-1.0 ml of TE at 65 °C for one hour. The recovery of DNA from both large scale methods is approximately 0.5-1.0 mg/g of flies. Drosophila embryos frozen in liquid nitrogen were difficult to homogenized manually to a fine powder. In this case, they were homogenized directly in 5 ml of solution A on ice in a Dounce homogenizer and subsequently treated identically as described in method II above. 7 2 The recovery of DNA is usually 1,5-3 mg/g (wet weight) of embryos. PARTIAL DIGESTION OF GENOMIC DNA FOR LIBRARY CONSTRUCTION  D. melanogaster Genomic DNA Libraries suitable for chromosomal walking were constructed by partial cleavage of D. melanogaster genomic DNA with the restriction enzyme Mbol, This enzyme recognizes and cuts at the tetranucleotide sequence -GATC- which, in theory, should occur once in every 256 base pairs. This sequence was assumed to occur with sufficient frequency within the D. melanogaster genome to permit the generation of a pseudo-random set of overlapping fragments representative of the entire genome. The rate of Mbol digest was ascertained emperically in a series of preliminary experiments consisting of 25 ul of genomic DNA and 0.5 units of Mbol in a final volume of 100 ul of 1 x Mbol restriction buffer. Aliquots of the digest (20 ul) were removed at various time intervals and the reactions were terminated by adding 1 ul of 0.5 M EDTA and 5 ul of 25% Ficoll, 0.07% bromophenol blue, 0.07% xylene cyanol. The extent of the digest was analyzed by electrophoresis in a0.2-0.3% agarose gel. For the preparative reaction, 750 ul of the genomic DNA was digested in 3 ml of 1 x Mbol restriction buffer maintaining the same ratio of enzyme units to DNA volume. However, as a precaution to prevent over-digestion, the predetermined time points for obtaining the optimal size range of DNA were reduced by 50% (Seed et al., 1982). Three aliquots of 250 ul were collected and 10 ul from each were analyzed as above by gel electrophoresis to ensure that the DNA was digested to the correct extent. The three aliquots were pooled and gently extracted three times with 1:1 phenol-chloroform (v/v) and then precipitated with 2 volumes of 95% ethanol. The pellet was then slowly redissolved in 0.5 ml of TE. D. erecta and D. yakuba Genomic DNAs Approximately 30-45 ul (~35 ug) of genomic DNAs from the two Drosophila species were digested with 5 units of Bam.HI in 100 ul of 1 x BamHI buffer at 37 °C. Aliquots of 25 ul were removed every 30 minutes and the reaction stopped by adding 1 ul of 0,5 M EDTA. The four 73 aliquots were pooled, extracted with an equal volume of 1:1 phenol : chloroform (v/v) and ethanol precipitated as described. The pellet was dispersed in 100 ul of 1 x CIP buffer (50 mM Tris pH 9.0,1 mM MgCl2,0.1 mM ZnCl2 and 1 mM spermidine) and incubated with 0.3-0.5 units of CIP per ug of DNA at 45 °C for 30 minutes. The DNA was extracted with 1:1 phenolxhloroform (v/v) and ethanol precipitated, and redissolved in TE at a final concentration of 0.125-0.5 mg/ml as estimated by gel electrophoresis against X standards. The DNA was then ligated to XEMBL3 arms without prior size fractionation. SIZE FRACTIONATION OF D. MELANOGASTER DNA  NaCl Linear Gradient The D. melanogaster genomic DNA partly digested with Mbol was fractionated on 13 ml NaCl linear gradients (1.25-5 M NaCl in TE) formed by a Hoefer multiple sucrose gradient maker (Dillelo and Woo, 1985). The gradients were centrifuged at 39,000 rpm in a SW40.1 rotor for 35 hr at 18 °C. One ml fractions were collected and diluted with an equal volume of TE. The DNA was precipitated by adding two volumes of 95% ethanol and centrifugation for 1 hr in a SW40.1 rotor at 20,000 rpm. The pellets were resuspended in 250 ul of TE and 10 ul from each fraction was analyzed by gel electrophoresis in a 0.2% agarose gel (fig. 7). The appropriate size fractions were pooled (15-23 kb for lambda and >35 kb for cosmid libraries) and dialyzed against 4 liters of TE. The DNA was then precipitated by adding 50% volume of 7,5 M ammonium acetate and 2 volumes of 95% ethanol, The pellet was solubilized in 20-30 ul of TE and a small amount (1-2 ul) was used to determine the concentration by A260 or by gel electrophoresis against A standards of known concentrations. Gel Fractionation DNA has also been sucessfully fractionated by agarose gel-elecrophoresis: but this method is not applicable in constructing cosmid libraries where gentler treatments are preferred due to the stringent requirement for large DNA. After digestion with restriction enzyme, the DNA was extracted with 1:1 phenol: chloroform (v/v) and ethanol precipitated, and the pellet was resuspended in 100 ul TE. The DNA was loaded into several slots in a 0.5% mini-agarose gel 74 F i g . 7 . Fractionation of Mbol partial digest of Oregon-R DNA by NaCl gradient. A typical example is shown here where the density of the gradient increases from fraction 1 to fraction 12. A small amounts of the DNA from 1 ml fractions were analyzed by electrophoresis in a 0.2% agarose gel. DNA fragments in the range of 15 kb to 35 kb were pooled for the construction of A. libraries (fractions 6 and 7) and fragments larger than 35 kb were pooled for the construction of cosmid libraries (fractions S.9 and 10). The exact amounts of DNA recovered from the fractions were determined by A 26o and/or by electrophoresis against a known amount of standard, usually J. DNA. The two lanes "A" are uncut il (50 kb), and lane "B" is HindIII cut A. and the sizes of the fragments liberated are shown on the right edge of the figure. Not shown in the figure is addition size markers generated by cutting 31 clts&57 with Sail, which liberates two fragments of 35kband 15 kb. -50 -23 -9.5 -6.5 76 containing 1 ug/ml ethidium bromide. Electrophoresis was carried out in the dark and the DNA was inspected with the aid of a long wave (365 nm) UV lamp. The gel region containing DNA in the 9.5-23 kb range was excised; the gel slice was placed inside a dialysis tubing and the DNA eluted by electrophoresis for 2 hours. The supernatant was extracted several times with n-butanol to remove the ethidium bromide and to concentrate the DNA. It was then passed through siliconized glass wool to remove debris, 1:1 phenol : chloroform (v/v) extracted and then ethanol precipitated as described. The pellet was resuspended in 10 ul of TE and the final concentration was estimated by agarose gel-electrophoresis comparing to known concentrations of X. standards. CONSTRUCTION OF GENOMIC LIBRARIES  Choice of Cloning Vectors Both cosmid and lambda vectors have been utilized to construct "walking" genomic libraries. The 9.2 kb cosmid vector, cosPneo, designed specifically as a Drosophila shuttle vector was used to construct the cosmid walking library (Steller and Pirrotta, 1985). The cosmid contains the transposable P-element to permit direct germline transformation, a feature which is extremely useful in identifying genes by mutant rescue even when the gene products are unknown (Haenlin et. al., 1985). The selectable marker, neo, confers neomycin (or its analog G418) resistance to the larval progeny of transformed flies (Steller and Pirrotta, 1985); since this phenotype is a gain of novel function, an unconstrained variety of fly strains (a convenient feature in mutant rescue) or perhaps species could theoretically serve as recipients (fig. 8A). In contrast, vectors utilizing either ry (Rubin and Spradling, 1983) Adh (D. A. Goldberg et. al., 1983) or w(Klemenz et. al, 1987) as selectable markers, only the corresponding mutants can be used as recipients. For a smaller cosmid library, the 5.2 kb cosmid pJB8 developed by Ish-Horowicz and Burke (1981) was used to clone BamHI digested D. melanogaster DNA. Replacement vectors XEMBL3 and AEMBL4 (Frischauf et al., 1983) and 312001 (Karn et al., 1984) containing polylinkers flanking the middle "stuffer" fragment were used as lambda 77 F i g . 8. Restriction maps of cosPneo and XEMBL3. ( A ) The polylinker cloning sequence in cosPneo is shown as a series of elevated restriction sites. Note that not all of them are unique (e.g. Hindlll). There are three cos sequences in the vector, allowing efficient cloning of a greater size range of genomic DNA. bla indicates the p-lactamase gene. The two P-element terminal repeats are indicated as boxes with darkened triangles, ori indicates the origin of replication in £. coli. hsp?0 indicates the Drosophila heat shock gene and the marker, neo, indicates the neomycin phosphotransferase gene used to select for transformants based on their resistance to G41S. [After Stellar and Pirrotta (1985).) The two arms are generated by cutting with BamHl within the polylinker sequence and Hpal nestled among the rorsites. The Hpal site may be substituted by other convenient unique restriction sites as long as both arms retain at least one cos sequence. (B) Most of the overlapping Oregon-R genomic clones from 31 were obtained from a library constructed using the versatile vector 3LEMBL3. The polylinker sequence is shown as elevated restriction sites. In the 3LEMBL4 cloning vector, the polylinkers are in the opposite orientation. In the new version, known as A.2001, the polylinkers contain sites for Xbal, BamHl. Hindlll. EcoRl. SstI and Xbal. L arm-left arm and R arm-right arm of phage respectively. [After Frischauf et. al., (1983).] 78 S B RI RI B S cos C 0 3 L arm staffer R arm 79 cloning vehicles (fig. 8B). These improved vectors are designed to select against wild-type phages at two levels during library construction and thus circumvent the need for purifying the vector arms (Karn et al, 1983). First, two different restriction enzymes are used to cut within the polylinkers. The religation of the "stuffer" fragment to the vector arms can then be prevented by the selective removal of the excised linker by isopropanol precipitations. Second, the "stuffer" fragment in the JL vectors contains gam* function and gam* phages are unable to form plaques on £ coll strains lysogenic for phage P2 (Sp/*). Although this phenomenon is not well understood, it has been exploited as an additional selection step against the wild-type vector. If the central fragment is replaced by genomic DNA, the phage would become insensitive to P2 inhibition (Sp/~) and should grow on P2 lysogens. Bacteriophage Lambda  Large-Scale Lambda Preparation Large-scale growth of bacteriophage lambda was based on the method developed by Yamamoto et al. (1970) and modified by Maniatis et al. (1982). £ coll host strains LE392 or Q358 were grown overnight at 30-37 °C in 20-50 ml of NZYM. The cells were harvested by centrifugation at 4000 rpm in an SS34 rotor for 10 minutes and then resuspended in 0.4 volume of 10 mMMgCl2. Approximately^ cells were mixed with 10* bacteriophages and the suspension was incubated at 37 °C with intermittent agitation. After 20 minutes, the infected cells were inoculated into 500 ml of NZYM medium pre-warmed to 37 °C. The culture was shaken vigorously at 37 °C to allow concomitant growth of both the cells and the bacteriophage. After the cells lysed to release the phage particles (characterized by considerable bacterial debris), 1-2 ml of chloroform was added to the culture and the incubation continued for another 15 minutes to complete the lysis. The cell debris was removed by centrifugation at 7000 rpm in a GSA rotor for 15 minutes at 4 °C and the phage supernatant was transferred to clean centrifuge tubes containing 29.2 g of NaCl (final concentration 1 M) and 50 g of PEG-8000 (final concentration 10%). The tube was mixed at 37 °C with slow shaking until the contents were dissolved and it was left overnight at 4 °C. The 80 bacteriophage was precipitated by centrifugation at 7000 rpm in a GSA rotor as described and the pellet was dispersed in 6 ml of SM buffer (10 mM Tris-Cl. pH 7.5,100 mM NaCl, 10 mM MgCl2 and 0.02% gelatin). The suspension was digested with DNase I (10 ug/ml) and RNase A (20 ug/ml) for 30 minutes at 37 °C. An equal volume of chloroform was added and the suspension was cleared by centrifugation at 10,000 rpm in an SS34 at 4 °C for 5 minutes. CsCl Gradient Purification of Live Lambda Bacteriophage To purify the bacteriophage further, 0.6 g of solid CsCl was added to each ml of supernatant. The phage suspension was transferred to 13 mm x 51 mm Beckman Quick-Seal tubes and centrifuged at 65,000 rpm in a VTi65 rotor at 4 °C for 60 minutes. The particles (bluish band) were retrieved with a 3 ml syringe equipped with a 21 gauge needle and the CsCl was removed by dialysis against a 1000 volume of 10 mM Tris-Cl pH 8.0, 10 mM NaCl, 10 mM MgCl2 (two changes, one hour each). Bacteriophage protein was removed by adding 20 mM EDTA, 0.5% SDS and 50 ug/ml proteinase K. After incubation at 65 °C for one hour, the supernatant was extracted several times with 1.1 phenolxhloroform (v/v) and the phases were separated by brief centrifugations as described. The aqueous phase was transferred to a dialysis sac with a wide bore pipet and dialyzed extensively against TE. Preparation of Lambda Vector Arms Approximately 20 pg of AEMBL3, AEMBL4 or 12001 vector was digested with 10 units of BamHl in 50 ul of 1 x BamHl buffer for 1 hour at 37 °C. Another 10 units of enzyme was added and DNA was digested for another 30 minutes. Small aliquots of the digest were analyzed in agarose gel to ensure that the digestion was complete. The DNA was extracted with 1:1 phenol.chloroform (v/v) and ethanol precipitated as described. The pellet was resuspended in 50 ul of 1 x EcoRI buffer and 10 units of EcoRI was added to the tube, After 1 hour at 37 °C, another 10 units were added and incubation was continued for 30-60 minutes. The digest was phenolxhloroform (1:1 v/v) extracted as before and sodium acetate was added to a final concentration of 0.3 M followed by 0.6 volume of isopropanol. The tube was incubated on ice for 15 minutes and then centrifuged for 5 minutes in a Eppendorf microfuge, The pellet was resuspended in 200 uiTEand the isopropanol precipitation step repeated. The excised linkers ,81 (<10 bp) should remain in the supernatant and therefore be selectively eliminated during the precipitation steps (Frischauf et, al, 1983). The pellet was washed 1-2 times with 70% ethanol and dried in vacuo, and then resuspended in 40 ul of TE. Preparation of Cosmid DNA Single ampicillin-resistant colonies of the F. co/r'strain DH1 transformed with cosPneo (a gift from Dr. V. Pirotta) and pJB8 (purchased from Amersham) were inoculated into 10 ml of LB supplemented with 100 ug/ml of ampicillin and grown overnight to saturation at 37 °C. Approximately half of the overnight cultures were used to inoculate a 500 ml culture grown in the same medium. The large scale alkaline isolation and purification of the cosmid DNA by CsCl gradient centrifugation were performed as described (see CsCl Gradient Purification of DNA, P. 60). Preparation of Cosmid Vector Arms Approximately 40 ug of cosPneo was linearized at the unique Hpal site in 200 ul of 1 x Hpal buffer (Steller and Pirrotta, 1985) (see fig. 8A). To prevent religation at this site during subsequent steps, 2.5 units of calf intestinal phosphatase were added directly to the restriction digest and the tube was transferred to 45 °C for 30 minutes. The DNA was extracted with phenol : chloroform (1:1 v/v) and ethanol precipitated as described, and the pellet was resuspended in 200 ul of 1 x BamHI buffer. The DNA was cut at the BamHI site within the polylinker with about 40 units of BamHI for 2 hours to generate two cosmid arms of 4.2 and 5 0 kb in size. The efficiency of each step above was ascertained by either agarose gel electrophoresis or ligation and transformation of £. coli strain DH1 (fig. 9, left panel). Cosmid arms from pJB8 were prepared essentially as described by Ish-Horowicz and Burke, 1981), The vector (20 ug) was linearized at either the Hindi 11 or Sail site in 100 ul of the appropriate restriction buffer with 20 units of enzyme. The reaction was terminated by heating at 68 °C for 15 minutes and the DNA was dephosphorylated with 5 units of calf intestinal phosphatase at 45 °C for 30 minutes to prevent the formation of tandem vectors. The DNA was extracted with phenolxhloroform (1:1 v/v) and ethanol precipitated, followed by resuspension in 100 ul of 1 x BamHI buffer. The vector was cleaved with 10 units of BamHI for 82-< F i g . 9 . Systematic testing of intactness of the restriction ends i n both vector and genomic DNAs before packaging. In the left panel, the cosPneo vector is tested for efficiencies of BamHl cutting and phosphatase treatment at the Hpal ends. Lane 1: cosPneo cut with Hoal and treated with calf intestinal phosphatase. Lane 2: as above except the DNA was incubated with 2 units of T4 DNA ligase overnight. The inability of the vector to form concatamers shows that the phosphatase treatment is essentially complete. Lane 3: as in lane 1, except the vector is further cleaved with Baml to liberate the two vector arms of 4.2 kb and 5 kb. Lane 4: as in lane 3, except the DNA was incubated with 2 units of T4 DNA ligase overnight. Approximately 95% of the DNA was re-ligated at the BamHl site. Residual amount of vector arms refractory to ligation was consistently observed over several tries, cautioning that the BamHl probably contained trace amounts of nuclease contaminant. Lane 5: approximately 0.2 |tg of vector arms were mixed with 0.2 molar equivalents of tester "insert" DNA generated by Mbol digest of JIL47.1. Lane 6: as in lane 5, except the DNA mixture was incubated overnight with 2 units of T4 DNA ligase in a 10 pi volume. The results showed that under these conditions, ligation is efficient. The right panel shows the results testing the intactness of the Mbol ends in the genomic DNA. Lane 7: 0.25 ug of fractionated genomic DNA was mixed with 10-fold molar excess of vector arms. Lane 8: identical to lane 7. except 2 units of T4 DNA ligase was incubated with the DNA overnight in a 10 ul volume. The conversion of the genomic DNA to high molecular weight material shows that the Mbol sites of the DNA remained intact through the several steps of preparatory manipulations. The 9.5 kb band is religation or the excess cosmid arms. The two lanes "A" are Hindlll generated Jl size markers, which are shown on the right edges of both panels. The ligation of Mbol genomic partial digest to Jl vectors was also treated similarly as in this example before packaging (data not shown). 83 8 ^ 6 hours and the reaction terminated by extraction with an equal volume of phenolxhloroform (1:1 v/v) and precipitation with ethanol. The pellet was redissolved in a final concentration of 0.5 mg/ml. An equimolar mixture of the BamHI* Hindi 11 cleaved and the BamHI + Sail cleaved pJB8 was used for cloning. Ligation of Lambda Vector Arms to Drosophila DNA D melanogaster Libraries Two molar excess of JLEMBL3 vector arms were mixed with 15-35 kb Mbol partially digested Drosophila DNA (Maniatis et. al., 1982). The final concentration of DNA was adjusted to approximately 400 ug/ml in 1 x T4 DNA ligase buffer and 2-3 units of T4 DNA ligase were added to the reaction, The ligation mix (10-20 ul) was withdrawn into a siliconized glass capillary and the ends were sealed by heating with a flame. The ligation reaction was carried out at 14-16 °C usually for 16-24 hours and the extent of the reaction was analyzed by electrophoresis of a small aliquot in a 03% agarose gel. Successful ligation was characterized by conversion of the discrete lambda arms and genomic DNA into high molecular weight DNA of more than 100 kb (concatameric form) and should be viscous when pipetting. A small aliquot of the ligation mix (0.5 ug) was test packaged with extracts prepared as described below. The relative efficiency of recombinant phages compared to religated wild-type vectors was measured by infecting the £ colistrains 0358 and 0359 (P2 lysogen). Typically, efficiencies ranging from 75-90% were routinely obtained. The ligation of unfractionated EcoRI digested genomic DNA to JLEMBL4 arms was performed under similar conditions as those described above. Drosophila Sibling Species Libraries Approximately 0.5 ug and 0.25 ug of BamHI partially-cleaved genomic DNA from D. erecta and D. yakuba respectively, were ligated to 1 ug of AEMBL3 arms with 2 units (0.5 ul) of T4 DNA ligase in a final volume of 10 ul of 1 x ligase buffer as described. Ligation for the D. teissieri library was performed in a 5 ul volume of 1 x ligase buffer containing 2 ug of 312001 vector arms, 0.25 ug (2 ul) BamHI insert DNA and 2 units of T4 DNA ligase. *85 Ligation of Cosmid Arms to JP. melanogaster DNA  cosPneo Vector Approximately 10 molar excess of cosPneo arms were ligated to 35-50 kb Mbol partially digested genomic DNA. The final DNA concentration in the ligation reaction was adjusted to 225 ug/ml in 1 x T4 DNA ligase buffer. A small aliquot (1-2 ul) was removed and stored at 4 °C as a control. T4 DNA ligase (2-3 units) was added to the rest of the ligation and gently mixed by pipetting. The ligation mix was transferred to a siliconized glass capillary and the reaction was performed as described above. Unlike ligation of genomic DNA to X vector arms, viscosity is not a reliable indicator of ligation efficiency. The efficiency here must be ascertained by electrophoretic analysis of 1-2 ul of the reaction as compared to the control in a 0.3% agarose gel (fig. 9, right panel). p IBS Vector The ligation conditions were similar to that described above except in this case, the D. melanogaster DNA was completely sheared with BamHl prior to ligation step. IN VITRO PACKAGING OF BACTERIOPHAGE X AND COSMID DNA Two slightly different systems have been used interchangeably throughout this work to regenerate phage particles in vitro from recombinant X and cosmid molecules. A popular method involves preparation of whole cell extracts from pairs of £ coli K-derived strains of X lysogens that have complementary defects in the X packaging proteins. When the lysogens are combined, the added X or cosmid recombinant molecules (and a low level of endogenous prophage DNA) are packaged by the full complement of bacteriophage proteins. A recent report however, showed that extracts prepared from these strains are contaminated with the £coK restriction system, a hidden variable which can contribute to the loss of up to 80% of the unmodified recombinant molecules during in vitro packaging (Rosenberg, 1985). Recently, a much improved " cos' system" utilizing only a single £. coliX lysogen B-derived strain that lacks this packaging bias has been constructed (Rosenberg et. al., 1985). Furthermore, the endogenous X prophage is disarmed by a deletion in the cos sequence, and thus cannot be 86' packaged by the crude extract. Small aliquots of the ligation mixes were always test packaged using whole cell lysates prepared as described below before embarking on a full scale experiment using the more efficient (and expensive) Gigapak cell-free extracts, which are usually 10 to 30 times more efficient (1-3 x 109 pfu/ug JlDNA). However, the pJB8 and XEMBL4 D. melanogaster libraries composed of approximately 150,000 recombinant clones each were collected only from test packaging experiments without resorting to the Gigapak system. Freeze-Thaw Two-Strain Packaging Extracts The procedure is adopted from the "freeze-thaw protocol I" in Maniatis et. al. (1982) with minor modifications described below. Single colonies of the two £. ro//strains NS428 (.4 am) andNS433 (iam) were inoculated into 10 ml of M9 medium (supplemented with 2% casamino acids) and incubated at 32 °C overnight (Sternberg et al, 1977). A small aliquot from each overnight culture was inoculated to separate flasks of 100 ml of M9 to an initial A600=0.1, and the cells were grown at 32 °C with vigorous shaking until the A600 ° f e a £ b culture was approximately 0.3 (midlog phase). The lysogenswere induced by immersing the flasks in a 65 °C water bath until the internal temperature, as measured by submerging an alcohol sterilized thermometer, reached 45 °C. The flasks were transferred quickly to a 45 °C shaker for 15 minutes and then the cells were incubated at 39 °C until they approached stationary phase (2 hours or about A600=l 0). A small sample of the iysogens was tranferred to a glass test tube and a drop of chloroform was added. Successful induction of the Iysogens was indicated by rapid cell lysis (Maniatis et al., 1982). The two cultures were then mixed together and rapidly cooled by swirling in an ice bath for 5 minutes. The cells were pelleted at 4000 rpm in a GSA rotor for 10 minutes at 4 °C and then redispersed in 100 ml of ice cold M9 lacking casamino acids. The cells were harvested again by centrifugation as described above and then thoroughly resuspended in 1 ml of CH buffer. Aliquots (50 ul) were dispensed into prechiiled 1.5 ml Eppendorf tubes and flash frozen in liquid nitrogen. Efficiencies of these extracts were routinely between 0.5-1.0 x 108 plaques from 1 ug of input "wild type" X DNA (clts857). 87' In vitro Packaging Using the Two Strain System A tube of the freeze-thaw extract was slowly thawed on ice for 3 minutes; <0.5 ug of DNA (Jl or cosmid) in 66 mM Tris-Cl pH 7.9, 10 mM MgCl2. 1.5 ul of 0.1 M ATP and an empirically determined volume of CH buffer (20-25 ul) were added and mixed thoroughly with a glass rod. The packaging reaction was incubated in a 37 °C water bath for 60 minutes. Another tube of packaging extract was thawed on ice;l ug of DNase I and 2.5 ul of 1 M MgCl2 were added and 20 pi of this second extract was added to the packaging reaction. The addition of a second portion of extract improved the efficiency of packaging 2-5 times. After a further 30 minute incubation, 0.9 ml of SM buffer and drops of chloroform were added. The tube was vortexed gently and then centrifuged for 2-3 minutes in a microfuge. The supernatant was transferred to a clean 1 5 ml Eppendorf and stored over a few drops of chloroform at 4 °C. cos' Packaging Extracts The £. coli B-derived strain SMR10 (a gift from Dr. F. Stahl) was grown in 100 ml LKB medium and induced similarly to that described above (also see Rosenberg et. al, 1985). After induction, the cells were harvested by centrifugation as above and the pellet resuspended in 9 ml of TSP (40 mM Tris-Cl pH 7.8, 10 mM spermidine and 10 mM putrescine) in the cold. The cells were pelleted again and then dispersed in 0.1 ml of TSP with the aid of a pipet. Aliquots of 20 ul of the concentrated cells were distributed to sequentially numbered 1.5 ml Eppendorf tubes containing 5 ul of 50% DMSO, 7.5 mM ATP, pH 7.0 and flash frozen in liquid nitrogen. The tubes were stored at -80 °C for up to two weeks. The in vitro packaging efficiencies of the cos' extracts ranged from 0.4-1.0x10** per ug of input Jl wild-type DNA (rlts857). However, it was noted that the efficiency was reduced somewhat with extracts that were flash frozen last. Attempts to stabilize the extracts by including 5% ,10% or 25% sucrose in the TSP were not successful. In vitro Packaging Using cos' Extracts DNA to be packaged was resuspended in 10 mM Tris-Cl pH 8.0, 50 mM KC1 and 1 mM EDTA. After addition of the DNA to the just thawed extract, the tube was transferred to a 37 °C water bath for 60 minutes. 0.5 ml of SMC buffer (0.7% Na2HP04, 0.3% KH2PO4, 0,05% NaCl, 0.01% 88 NR4CI, 1 mM MgCl2,0.1 M CaCl2,50 ug/ml DNase I) was added and the tube was gently vortexed to disperse the pellet. The cell debris was removed by a brief centrifugation and the supernatant stored at 4 °C over a drop of chloroform. AMPLIFICATION OF GENOMIC LIBRARIES F. coli strain Q359 for propagating the JL libraries was grown overnight in NZYM supplemented with 0.4% maltose to induce high level expression of the Jl receptor gene lamb. The cells were then concentrated 2.5 fold in 10 mM MgCl2 as described (see Large-Scale Lambda Preparation, P. 79). A small volume of host cells (1.0 ml) were infected with 15,000 to 20,000 plaque forming units at room temperature for 5 minutes to coordinate attachment of the phage and then transferred to a 37 °C water bath for 20 minutes to allow injection of the phage DNA into the host. Soft agar (7.5 ml) was added and the suspension was plated onto 150 mm NZYM plates, and the phages were grown overnight at 37 °C. SM buffer (10-12 ml) was added to the plates and the phage eluted by slow diffusion at 4 °C for several hours. The supernatant was collected and debris removed by centrifugation in an SS34 rotor at 7000 rpm for 20 minutes. Aliquots were stored at -70 °C in the presence of 7% DMSO, or at 4 °C with a few drops of chloroform as preservative. For the JIEMBL3 "walking" library, a total of at least 300,000 unique plaques were collected. For the Drosophila sibling species libraries approximately 250,000, 40,000 and 250,000 unique plaques were obtained for D. erecta, D. yakubaxbt D. teissieri, respectively. The reck £. ccZ/strain DH1 was used to propagate the cosmid libraries. They were grown overnight and concentrated 20 fold in 10 mM MgCl2 as described. Twenty-five ul of the cells were adsorbed with 10,000 - 20,000 packaged cosmids at room temperature and then at 37 °C as above. After infection, 0.5 ml of LB medium was added and the cells were incubated at 37 °C for 45 minutes to allow expression of the f>-lactamase gene. The cells were then concentrated by centrifugation (30 seconds in the microfuge), resuspended in a small volume (100 - 200 ul of 0,5% NaCl) and plated on LB-glucose plates supplemented ¥ i t h 40 ng/ml of ampicillin, The presence of glucose helps to inhibit the growth of fortuitously packaged endogenous X DNA. 89; The ampicillin-resisitant colonies were pooled by washing the plates with 0.5% NaCl, centrifugation in an SS34 rotor at 6,000 rpm for 10 minutes and resuspended in LB-glucose supplemented with 40 ug/ml ampicillin and 15% glycerol. For the cosPneo "walking" library, at least 500,000 unique colonies were obtained at an efficiency of 4 x 105/ug genomic DNA. PREPARATION OF RADIOLABELLED PROBES  Nick-Translation Radiolabelled probes were prepared by the method of nick-translation (Rigby et. al., 1977) with minor modifications (Dr. Ross MacGillivray, personal communications). DNase I (1 mg/ml) was freshly diluted in 10 mM Tris-Cl pH 7.5, 5 mM MgCl2 and 1 mg/ml BSA to a final concentration of 10 ug/ml and incubated on ice for 20 minutes. DNA (0.5 ug -1 ug) was added to a 50 ul cocktail containing 50 mM Tris-Cl, pH 7,5, 5m M MgCl2, 10 mM B-mercaptoethanol, 0.02 mM dGTP, 0.02 mM dTTP, 14 mM dATP, 14 mM dCTP, 0,02 mM CaCl2, 35-70 pmoles each of [a32p]dATP and lct32p]dCTP, 50 pg of the activated DNase I and 10 units of £ coli DNA polymerase I (Kornberg enzyme). The reaction was incubated at 16 °C for 2.5 - 4,0 hours and terminated by adding 75 ul of 1% SDS, 10 mM EDTA and heated at 68 °C for 15 minutes. £. coli tRNA was added (25 ug) as a carrier and the nick-translated DNA probe separated from unincorporated radiolabelled nucleotides in a small column containing AcA 54 resin (LKB) pre-equilibrated in 10 mM Tris-Cl pH 7.5. 0.2 M NaCl and 0.25 mM EDTA. Fractions of ~400 ul were collected and the specific activity determined by Cerenkov radiation, which is usually in the range of 10^  cpm/ug of input DNA. The probe was denatured with 0.1 M NaOH at 65 °C for 15 minutes then neutralized with 0.15 M NaH2P04 and added to the hybridzation mix, Oligonucleotide Probes About 10 - 20 pmoles of the oligonucleotide was incubated in 10 ul 1 x kinase buffer containing 100 mM Tris-Cl pH 8.0, 10 mM MgCl2, 50 mM DTT, 30-50 uCi of [^Pl-ATP and 2-5 units of T4 polynucleotide kinase. The reaction was incubated at 37 °C for 30 - 45 minutes and stopped by heating at 68 °C for 10 minutes. The radiolabelled probe was used without separation from the unincorporated radiolabelled nucleotides. 90. Construction of tRNA4 7 S e r - and t R N A A r g - Specific Probes bv Strand-Svnthesis Approximately 7 ug of pDt5. a recombinant plasmid containing a single tRNA7$e r gene (Newton, 1984; Cribbs et, al, 1987b), was digested with HaeJII and Ddel in combination. From the available DNA sequence data (Newton, 1984), the combination of enzymes should produce a 133 bp fragment containing a truncated tRNA7 S e r gene, starting at the HaelH site at nucleotide 9 within the gene, The Ddel restriction ends of the restriction fragments were repaired by filling with all four dNTP using the Klenow enzyme and they were resolved by electrophoresis in a 5% polyacrylamide gel. The t R N A A r a probe was constructed similarly by cutting pDt27R with HaelH and Ddel, From sequence analysis, this plasmid contains four duplicated tRNA A r8 genes sharing different extents of almost perfect sequence homology in their flanking regions (Newton, 1984), The restriction digest should produce four overlapping 70 bp fragments each containing a partial tRNA A r 8 gene truncated at the HaeHI site (see above) and 8 bp 3' to the gene. Restriction fragments corresponding to the predicted sizes from the above two experiments were excised from the gel and the DNA eluted overnight in 1 ml of M+G elution buffer (P. 66). The fragments were recovered by precipitation with ethanol and cloned into the Smal site of M13mp9. Single-stranded DNA templates were prepared from randomly chosen bacteriophage plaques, and their nucleotide sequences determined. One clone each containing the coding strand of the predicted partial tRNA7$e r and tRNA A r f i genes were obtained and were used as templates to generate primer-extended hybridization probes. Hybridization probes were constructed by annealing the oligonucleotide primer, Pex, to the t e m p l a t e as in the initial step in sequencing by the chain-terminating method. [ct32p]dATP (10"2 pmoles) and 1 ul of a "primer-extension mix" were added (0.5mM dGTP, dCTP and dTTP each), and the reaction was started by incubation of the mix with 0.5 units of Klenow enzyme at room temperature for 10 minutes. The reaction was stopped by heating the tube at 68 °C for 10 minutes and unicorporated radiolabelled nucleotides were eliminated by chromatography through a small column containing AcA 54 resin as described. The extended probe was 91 denatured in 0.1M NaOH, neutralized in 0.15 M NaH2P04 and added to the hybridization mix. For increased detection sensitivity, 5 units of HaelH was added to release the double-stranded insert from the template. After 30 minutes at 37 °C. solid urea was added to the digest to a final concentration of 7-8 M and the probe was denatured by heating at 90 °C for 3 minutes. The reaction mixture was applied to a 5% polyacrylamide gel containing 8 M urea and electrophoresis was carried out at 800 volts for 2-3 hours. Radiolabelled bands were detected by autoradiography and the single-stranded probe was excised, eluted from the gel and concentrated by ethanol precipitation as described. Alternatively, after the extension step, the probe was denatured from the template by adding 0.1 M NaOH and purified by passage through a Bio-Gel A-5m column equilibrated in 0.1 M NaOH (Dr. D. Cribbs, personal communications). EMPIRICAL EVALUATION OF GENOMIC LIBRARIES BY SOUTHERN BLOTTING All newly constructed X libraries were evaluated for completeness in sequence representation. Approximately 10** £ coli host cells were infected with 2.5 x 107 bacteriophage and the infected cells were inoculated into 50 ml of NZYM prewarmed to 37 °C (see Large Scale Lambda Preparation). The phage particles released from lysed cells were precipitated with 1 M NaCl and 10% PEG-8000 as described and the pellet was dispersed in 1 ml of DNase I buffer (50 mM Tris-Cl pH 7.5, 5 mM MgCl2 and 0.5 mM CaCl2). The suspension was digested with DNase I (100 ug/ml) and ribonuclease A (200 ug/ml) at 37 °C for 30 minutes. Bacterial debris was then pelleted by a brief centrifugation at 12,000 x g and the supernatant transferred to a clean 1.5 ml Eppendorf tube. SDS (1%), EDTA (5 mM) and proteinase K (150 ug/ml) were added and the tube was incubated at 68 °C for 60 minutes. The supernatant was phenol extracted and the bacteriophage DNA precipitated with ethanol. The DNA was then digested with various enzymes and the fragments resolved in a 0.5-0.6% agarose gel followed by Southern blotting onto a sheet of Hybond nylon membrane. The treatment of the filter and hybridization with various radiolabelled probes were performed as described (see Southern Blotting). A similar protocol for evaluating genomic DNA libraries has been independently developed by Phillips et al. (1985). 92 SCREENING GENOMIC LIBRARIES  Plating Bacteriophage I Libraries Approximately 2 x IO9 cells were infected with 50,000 bacteriophages and the infected cells were plated on 150 mm NZYM plates as described (see AMPLIFICATION OF GENOMIC LIBRARIES, P. 88). When the phage plaques were nearly confluent, the plates were placed at 4 °C for at least 30 minutes to allow the top agarose to harden. A dry Hybond nylon circle (137 mm) was placed onto the surface of the top agarose for about 1 minute to wet, allowing diffusion of the bacteriophage and free DNA onto the membrane (Benton and Davis, 1977). The orientation was recorded by puncturing holes in three to four asymmetric locations in the membrane and into the agar beneath with an 21-gauge needle. After the membrane was evenly wetted, it was peeled off with a pair of blunt-ended forceps and a replica was made with a second filter following the same outline above, except that it was left on the surface of the top agarose for 30 seconds longer. If the original plaques were small, the filter bound bacteriophages were amplified by incubating the filter with the phage side up on a fresh NZYM plate overnight (modified from Woo, 1979). Plating Cosmid Libraries £. colt cells harboring cosmids were plated on several 150 mm LB-glucose plates supplemented with 40-50 ug/ml of ampicillin at densities between 20,000 - 40,000 cells per plate. The plates were incubated at 37 °C until the colonies were barely visible and they were stored at 4 °C for 1-2 hours to allow the colonies to harden. The colonies were transferred onto a moistened sterile Hybond nylon membrane by blotting as above. A replica copy of the colonies was made by pressing a second moistened sterile membrane to the first and their orientations relative to one another and to the plate were marked with asymmetrically located holes as described above. The membranes were then placed onto fresh LB-glucose plates (supplemented with 40-50 ug/ml ampicillin) with the colonies side up, and along with the original plates were incubated for approximately 6 -8 hours at 37 °C to allow the colonies to grow to 1-2 mm in diameter. The cosmids were amplified overnight by transferring the membrane-bound colonies onto LB plates supplemented with 250 ug chloramphenicol/ml (Hanahan and Meselson, 1980 and 1983). Lvsis of Membrane Bound Bacteriophages or Bacterial Colonies Membrane bound phages or bacterial colonies were lysed by floating the filters on a shallow pool of 10% SDS for 5-10 minutes. The membranes were immersed in a denaturing solution (0.5 M NaOH, 1.5M NaCl) and then neutralized in 1 M Tris-Cl pH 7.5 for 2 to 10 minutes at each step. They were then washed briefly in 0.5 M Tris-Cl pH 7.5, 1.5 M NaCl and then dried in air for 30 minutes. The DNA was immobilized onto the membrane by irradiation with UV (254 nm) for 2-3 minutes as described by the supplier's manual (Amersham). Prehvbridization The membranes were washed in several changes of 3 x SSC, 0.1% SDS at 65 °C and gently scrubbed with a toothbrush to remove bacterial debris. The membranes were prehybridized in 10-20 ml of 1 x Denhardt solution, 6 x SSC, 0.1% SDS and 1 mM EDTA for 5 minutes to overnight at 65 °C. If an oligonucleotide was to be used as a probe, the membranes were prehybridized at temperatures between 37-60 °C in 10 x Denhardt solution, 6 x SSC and 0.1% SDS (Zoller and Smith, 1983). Prehybridization was carried out in a petri plate and a circular piece of Mylar plastic cut to size was then placed on top of the stack of membranes to ensure that they remained submerged. The petri plate was then placed inside a Pyrex dish with a few sheets of moist paper towels and the assembly was sealed with Saran wrap. Occasionally, carrier DNA (calf thymus or salmon sperm) was also included in the prehybridization buffer. Hybridization For nick-translated or primer-extended probes, the membranes were hybridized in 1 x Denhardt solution, 6 x SSC, 0.1% SDS and 1 mMEDTA in a petri plate at 65 °C for 8-14 hours. The volume of the hybridization was kept to a minimum containing about 10& cpm of radiolabelled probe per filter. After hybridization, the filters were then washed 3-4 times in 1 x SSC, 0.5% SDS (w/v) at 65 °C to remove excess probes. If the background remained unacceptably high after the initial washes, as determined either by autoradiography or by monitering with a 94 Geiger counter, the membranes were re-washed in 0.2 x SSC, 0.1% SDS (w/v) at 68 °C several times aided by gentle scrubbing with a gloved hand. When an oligonucleotide probe was used, it was first heated at 65 °C for 5 minutes to denature any secondary structures before adding to the hybridization mix containing 10 x Denhardt solution, 6 x SSC and 0.1% SDS. The hybridization was conducted at temperatures determined by the formula Td (°CM4(G+C)*2(A+T)]-5, where Td equals the hybridization temperature in °C (Meinkoth and Wahl, 1984). After 1-3 hours, the membranes were washed once briefly in 6 x SSC, 0.1% SDS (w/v) at room temperature to eliminate most of the excess probe and then the washings were repeated twice more with the same solution at temperatures contingent upon the nucleotide content of the probe (see above). After the washes, the filters were placed on a sheet of expired X-ray film as support (bleached to remove the film coating) and the hybridization signals were detected by autoradiography at -70 °C. Cronex enhancing screens were used whenever possible. Isolation and Purification of X Clones Usually the positive signals on the film could not be assigned to a single phage plaque due to the high plating densities and thus a plug of agar was removed from the corresponding area of the plate with the wide end of a sterile Pasteur's pipet. The mixture of phages were eluted in 1 ml of SM buffer, and the suspension was replated at lower densities by 10 fold serial dilutions. Plaque lifts and screening of filter bound phage DNA by hybridization to radiolabelled probes were conducted as described. Isolated positive plaques were removed with a sterile Pasteur's pipet and eluted in 100 ul of SM buffer. The phage eluate was added to 100 ul of the appropriate £ co//host resuspended in 10 mM MgCl2 and incubated for 15 minutes at 37 °C. The infected cells were inoculated into 50 ml of NZYM and grown at 37 °C until the ceils began to lyse (6-8 hours). The purification of released bacteriophage and the isolation of DNA have been described (see Large Scale Lambda Preparation on P. 79 and Empirical Evaluation of Genomic Libraries on P. 91). Isolation and Purification of Cosmid Clones Cosmid colonies recovered on the agar plug were resuspended in 1 ml of 0.5% NaCl. The resuspended cells were replated at lower densities on LB-glucose plates (+40 pg/ml ampicillin) and rescreened within radiolabelled probes as described. Well-isolated positive colonies were inoculated in 5 ml of LB-glucose (40 ug/ml of ampicillin) in a 125 ml Erlenmeyer flask and shaken vigorously at 37 °C for 14-16 hours. The culture was transferred into a 14 ml graduated polypropylene tube and the cells were harvested by centrifugation for 15 minutes in a clinical centrifuge. The pellet was dispersed in 100 ul of 50 mM glucose, 25 mM Tris-Cl pH 8.0 and 1 mM EDTA and the plasmid DNA isolated by the alkaline lysis method as described (see Small Scale Mini-Prep, P. 61). RESTRICTION ENDONUCLEASE DIGESTS For general analytical restriction digests, five different buffers have been found to adequately accommodate the spectrum of restriction enzyme requirements throughout this work. All restriction buffers are stored at -20 °C as 10 x stocks containing 100 mM Tris-Cl pH 7.8,100 mM MgCl2,10 mM EDTA, 60 mM B-mercaptoethanol in addition to one of the following salt requirements: 0 mM NaCl, 0.6 M NaCl, 1.0 M NaCl, 1.5 M NaCl and 60 mM KC1. For genomic library construction or for restriction endonucleases with more fastidious requirements, a separate set of restriction buffers were prepared as 10 x stocks according to the specifications of the suppliers, filter sterilized and stored as small aliquots at -20 °C (see Table IV). Restriction digests were performed in 1.5 ml Eppendorf tubes in accordance with the supplier's instructions. The final DNA concentrations in the reaction were never more than 200 ug/ml and usually carried out with 2-5 fold the recommended enzyme units. Plasmids isolated from "mini-prep's" containing large amounts of contaminating RNA were digested with inclusion of ribonuclease A at a final concentration of 40 ug/ml. Restriction digestions requiring more than one endonuclease were always performed sequentially with small aliquots of the reactions removed for analysis by gel electrophoresis in between steps. Digestions involving enzymes requiring different concentrations of the same salt, the enzyme with the lower salt requirement was used first; when the first reaction was complete, the salt concentration of the digest was appropriately adjusted before adding the second T A B L E IV- SPECIFIC BUFFERS FOR RESTRICTION ENZYMES USED IN LIBRARY  CONSTRUCTION Buffer *Final Comoosition in mM Tris-Cl MgC12 NaCl KC1 DTT PH BamHI 10 10 100 - 10 7.5 EcoR I 10 10 150 - 10 7.5 Hpal 10 10 - 50 10 7.5 Hind III 10 10 60 - 10 8.0 Sail 8 6 150 - 10 7.6 Mbol 50 10 50 _ _ 8.0 *AI1 buffers were made as 10 x stocks 97 enzyme. If different salts were required by the restriction enzymes (e.g. NaCl and KC1), after completion of the digest with the first enzyme, the mix was dialyzed as a droplet on Millipore VM filters (0.05 um in pore size) floating on the surface of 5 ml of TE inside a small petri plate (Dr. Robert Devlin, personal communications). After 30 minutes, the droplet was recovered inside a clean 1.5 mi Eppendorf tube, one-tenth volume of the second restriction buffer was added and the reaction continued with the addition of the second enzyme. GEL ELECTROPHORESIS  Agarose Gels Agarose was dissolved in the appropriate volume of 45 mM Tris-Cl pH 8.3, 45 mM boric acid and 1 mM EDTA (0.5 x TBE) by boiling. After cooling to 55-60 °C (warm to touch), ethidium bromide was added to a final concentration of 1 ug/ml and the solution was poured into Plexiglass trays; sample slots were moulded by inserting a plastic comb at one end and the gel was allowed to solidify at room temperature. For routine analytical gel-electrophoresis such as monitoring the progress of a restriction digest, "mini-gels" measuring 6.5 cm x 10 cm x 0.4 cm were cast; for preparative gels, dimensions measuring 20 cm x 25 cm x 0.5 cm were used. For low percentage agarose gels (0.2-0.3%) required for the analyses of large molecular weight genomic DNA, a supporting frame of 0.5% agarose was cast with a Plexiglass mould and allowed to set before a 0.2%-0.3% agarose solution was poured into the center. Electrophoresis was carried out horizontally with the gel submerged in 0.5 x TBE at 2-5 volts/cm^. The gel was photographed over a UV transilluminator or by shadowing with a hand held UV lamp using Polaroid type 667 film in a Polaroid MP-4 camera. Acrylamide Gels A stock containing 40% acylamide and 2% bis-acrylamide in deionized water was stored in a brown bottle at 4 °C. Non-denaturing gels were prepared by mixing appropriate volumes of the acrylamide stock and 10 x TBE (final 0.5 x TBE) and filtering through Whatman glass microfibre filters (934-AH). The gel solution was degassed, ammonium persulfate was added to 0.06% from a 10% stock and N,N,N',N-tetramethylethylenediamine (TEMED) to 0.05 to 0.1%. 98 The preparation of denaturing gels was identical except solid urea was added to a final concentration of 8.4 M before addition of the catalysts. Glass gel plates, measuring 20 cm x 35 cm, were scrubbed clean with scowering powder and rinsed with water. After air drying, the inner surfaces of the plates were washed with 95% ethanol and then 2% dimethyldichlorosilane dissolved in heptane was liberally applied. A Kimwipe saturated with 95% ethanol was used to remove excess dimethyldichlorosilane. Mylar spacers between 035 to 0,5 mm thick were placed between the plates and then they were assembled together with electrical tapes. The acrylamide solution was poured slowly down one side of the space between the plates to avoid trapping air bubbles. Sample slots were cast by inserting a gel comb into the top of the gel solution and the sides of plates were tightly clamped to ensure good contact between the plates with the spacers and the gel comb. After polymerization (1 hour), the slot former and electrical tape along the bottom of the plates were removed and the gel was clamped into a vertical electrophoresis apparatus. Both the top and the bottom reservoir were filled with 0.5 x TBE. The slots were flushed clean just before the samples were loaded and the gels were run at 2-10 V/cm. After electrophoresis, the tape along the edges of the plates was removed and the plates separated with the aid of a thin spatula. Distilled water containing 1 ug/ml ethidium bromide was poured onto the the gel and distributed across the surface evenly with a glass spreader. After 20 minutes, the DNA bands were visualized and photographed under UV illumination. If the DNA was radiolabelled, the gel was wrapped in Saran wrap and autoradiographed directly. "Wedge" shaped sequencing slab gels were cast in siliconized plates as described by Chen and Seeburg (1985) The plates were initially separated by 0.35 cm thick Mylar spacers as in regular thin sequencing gels. However, progressively shorter strips of 0.17 cm thick Mylar (1/3,1/7 and 1/10 of gel length) were inserted into the bottom edge of the gel to increase the thickness to 0.86 cm. After the acrylamide solutions (6% or 8%) were poured between the plates, a slot former with 0.25 cm wide teeth was inserted at the top. Sequencing reactions of up to 1 ul were loaded into each slot immediately after it was flushed clean of urea and electrophoresis was carried out at 30-32 watts (constant power). The variable gel thickness 99 causes the DNA to migrate slower as it approaches the thicker bottom. This results in even spacing of all adjacent DNA fragments throughout the gel. To prevent the "smiling" of samples close to the edges, an aluminum plate (20 cm x 20 cm) was clamped to the gel assembly to maintain an even temperature distribution across the surface. After electrophoresis, the gel was transferred onto a sheet of Whatman 3MM paper by blotting and completely dried by a slab gel dryer (Bio-Rad). The gel was then covered with a sheet of Saran wrap and autoradiography was performed by placing a sheet of X-ray film in direct contact with the dried gel and exposed at -70 °C. With double-stranded sequencing, exposure time can be as short as one hour. Polyacrylamide gels for resolving Maxam and Gilbert sequencing reactions were regular "non-wedged" gels, and at concentrations between 8 to 20%. After electrophoresis, the gels were protected in Saran wrap without drying. Autoradiography was performed at -70 °C with the aid of intensifying screens whenever possible. RECOVERY OF RESTRICTION FRAGMENTS FROM GELS  Agarose Gels Small restriction fragments under 4 kb were recovered using DEAE membranes according to Dretzen et. al. (1981). After the DNA fragments from a restriction digest were sufficiently resolved by electrophoresis, a small piece of DEAE membrane (rinsed in distilled water) was inserted into the gel through a slit cut perpendicular to the desired DNA band. The gel was turned through 90°, with the membrane now nearest to the positive electrode and the DNA was transferred onto the membrane electrophoretically. It was then rinsed in 1 ml of 0.15 M NaCl, 0.1 mM EDTA and 20 mM Tris-Cl pH 8.0 by vortexing to remove any adhering agarose. The DNA was eluted from the membrane by incubation in 100 ul of 1.0 M NaCl, 0.1 mM EDTA and 20 mM Tris-Cl pH 8.0. The supernatant was extracted several times with n-butanol to remove the ethidium bromide. After adding one-half volume of 7.5 M ammonium acetate, the DNA was recovered by precipitation with two volumes of ethanol. The recovery of DNA from the DEAE membrane was approximately 60-70%, but precipitously less efficient with 100 fragments above 4 kb. Larger restriction fragments above 4 kb were recovered either by electroelution into a dialysis sac as described (Size Fractionation of D. melanogaster DNA, P. 73) or by using low melting point agarose (LMP). The LMP agarose was cast as normal agarose except solidification was at 4 °C. Restriction fragments were resolved electrophoretically at room temperature and specific fragments were excised with a scalpel. The gel slice was placed in a 1.5 ml Eppendorf tube and melted by heating at 70 °C for 10 minutes and sufficient TE was added to a final volume of 0.7 ml. The supernatant was extracted twice with equal volumes of phenol. The aqueous phase was transferred to a clean 1.5 ml Eppendorf tube and re-extracted twice with equal volumes of 1:1 phenolxhloroform (v/v), once with chloroform alone and the volume of the aqueous phase was then reduced by repeated extraction with n-butanol. Ammonium acetate was added (50% volume), mixed by vortexing and the DNA precipitated by addition of two volumes of ethanol. The pellet was rinsed several times with 70% ethanol, dried briefly in vacuo and redissolved in a small volume of TE. The concentration of the fragment was approximated by gel-electrophoresis against known marker DNA, Polvacrvlamide Gels The recovery of specific DNA fragments by eluting from acrylamide gel slices have been described, (see Maxam and Gilbert DNA Sequencing, P. 66). SOUTHERN TRANSFER Transfer was performed as essentially described by Southern (1975) with minor modifications discussed below. To facilitate transfer, the DNA was partially depurinated by submerging the agarose gel in 250 ml of 0.25 M HC1 for about 30 minutes and then briefly rinsed several times in tap water (Alwine et. al, 1979). The DNA was denatured and cleaved at the depurinated residues in situ with 250 ml of 0.5 M NaOH, 1.5 M NaCl for 30 minutes followed by repeated rinses in tap water. The gel was then neutralized in 250 ml of 1.5 M ammonium acetate, 0.02 M NaOH for 30 minutes (Frei et al., 1983) and was then placed upside down on 2-3 sheets of Whatman 3 MM paper saturated with the same buffer (Wahl et al., 1979). A piece of 101 Hybond nylon cut to size was rinsed in the 1.5 M ammonium acetate, 0.02 M NaOH solution and then placed on top of the gel followed by several sheets of dry Whatman 3MM and a stack of paper towels. Transfer of the DNA onto the nylon membrane was essentially complete after approximately 2 hours (Meinkoth and Wahl, 1984). For acrylamide gels, the DNA fragments were usually small enough to obviate the acid treatment. The gel was denatured and neutralized with the same solutions and then it was transferred onto a dry sheet of Whatman 3MM paper to facilitate handling. To prevent the gel from adhering irreversibly to the nylon filter during transfer, the gel was covered with a thin layer of 0.5% agarose just prior to overlaying with the membrane (Gergen et al, 1981). The assembly was then placed on top of several sheets of Whatman 3MM wicks connected to a reservoir containing approximately 500 ml of 1.5 M ammonium acetate, 0.02 M NaOH. Transfer was carried out for approximately 16-24 hours. After transfer, the filter was dried in air and the DNA immobilized onto the filter by irradiation with UV for 3 minutes (Amersham). The filter was washed in 0.1% SDS at 65 °C for 20 minutes as a substitute for prehybridization. Hybridzation to radiolabelled probes and conditions for washes to remove excess probe were performed as described (see Screening Libraries, P. 93). RESTRICTION MAPPING Low Resolution Restriction Endonuclease Mapping Low resolution mapping were routinely performed by single and multiple restriction digests as described by Danna (1980). In most cases, JL and cosmid mapping data were derived from composite maps based on subclones of smaller restriction fragments. Electrophoresis in agarose or polyacrylamide gels were used routinely to display the digestion products. Restriction Endonuclease Mapping by Partial Digestion Higher resolution mapping for smaller DNA fragments was performed by partial restriction endonuclease digestion as described by Smith and Birnstiel (1976). DNA fragments to be mapped were gel purified and radiolabelled at their 3" ends with [a^^P]-102 deoxyribonucieoside-5'-triphosphates and the Klenow enzyme (see Maxam Gilbert Sequencing). The fragments were suspended in 50 ul of the appropriate 1 x restriction buffer with about 1 ug of unlabelled calf thymus DNA as a carrier. Approximately 1 unit of the appropriate restriction endonuclease was added, mixed and incubated at 37 °C or 65 °C for Taql. Aliquots of 10 ul were removed at various times and the reaction stopped by the addition of 1 ul of stop mix (0.25MEDTA, 12.5% Ficoll, 0.05% bromophenol blue and 0.05% xylene cyanol). The partially digested products were resolved by electrophoresis in a 5% polyacrylamide gel and the radiolabelled fragments were detected by autoradiography. A Novel Restriction Endonuclease Mapping Method Bv Indirect Labelling with Sequencing  Oligonucleotide Primers I have developed an alternative restriction mapping method based on the "indirect end-labelling" technique used by Wu (1980) to map the DNase I hypersensitive sites 5' to the Drosophila heat shock genes hsp70. The advantage of the Oligonucleotide Indirect Labelling method (OIL) is that purification of end-labelled fragments is unnecessary and no special "mapping" vectors are required such as those utilizing sequences (Little and Cross, 1985) or SP6,T7 andT3 promoters (Wahl et. al., 1987) as reference points. A recombinant plasmids, pUC or pEMBL, was digested with one of the restriction enzymes having a rare recognition site 5' to the sequencing primer annealing site at nucleotides 379-395 (Yanisch-Perron et. al., 1985). It was then redigested with another endonuclease which cleaves 3' to the cloned insert. Changes in restriction buffers were accomplished by dialysis on Millipore filters (see Restriction Endonuclease Digests, P. %). The DNA was then divided into several aliquots, and one-tenth volume of the appropriate 10 x restiction buffers was added to each tube. Respective endonucleases for mapping (1-3 units) were then added to each tube, mixed and incubated at 37 °C for most of the enzymes used or 65 °C for Taql, Aliquots of the digests were removed at time points between 3-30 minutes and transferred to 1.5 ml Eppendorf tubes containing several pi of a stop mix (0,25 M EDTA, 12.5% Ficoll, 0.5% bromophenol blue and 0,5% xylene cyanol). The digested products were resolved in 0.8 to 1.0% 103 Fig. 10. Restriction mapping by oligonucleotide indirect labelling method. The insert (thin line) is first released from the vector (thick line) by restriction cutting, in this example, with Pvull at the 5-end including the priming site and with PstI at the 3-end within the polylinker cloning sequence. It is important to note that the 5' cutting site must be upstream from the sequencing priming site. A variety of such ideal sites are available, which are otherwise very rare cutters (consult Yanisch-Perron et. at, 1985). The 3' cutting site can usually be conveniently found in the polylinker sequence, but sites internal to the cloned inserts can also be used, as long as the region to be mapped is included in the released fragment. The distribution of sites for a particular restriction enzyme within the released insert is then determined by partial digestion (Hpall sites in this particular example), and the mixture of fragments (including both insert and vector) are then resolved by gel-electrophoresis, followed by Southern transfer onto a sheet of membrane. Fragments specific to the insert spanning from the fixed Pvull site to the various Hpall sites are then indirectly labelled by hybridization to the sequencing primers, either F l or Pex (table I). These specific fragments are depicted as thick rectangular blocks in the autoradiography cartoon. Since the hybridization is specific to the insert, fragment purification away from vector sequences would be absolutely unnecessary. As in all other published restriction mapping methods, the only limitation is the extent of resolution of the fragments by gel-electrophoresis. 104 P s t l P v u l l P v u l l Release insert P s t l P s t l P v u l l P v u l Hpal I partial digestion Resovled by gel electrophoresis Autoradiography F1 or Pex > Southern Hybridization 105 agarose gels containing 1 ug/ml ethidium bromide. After electrophoresis, the DNA was transferred onto Hybond nylon for about 2-4 hours by the method of Southern (1975) as modified by Meinkoth and Wahl, (1985). The filter was then hybridized to the forward sequencing, F l , primer radiolabelled with [y^PJATP andT4 polynucleotide kinase. Since Fl does notabuttthe end of the released restriction fragment, care was taken to choose mapping enzymes that only cleave 3' to the priming site. The partially digested products specific to the cloned insert were detected by autoradiography for 6 hours to overnight (fig. 10). The entire procedure can be accomplished in 2-3 days, which is slightly faster than the well-established Smith and Birnstiel method. The latter method generally required more manipulations and a longer exposure time for autoradiography. MOLECULAR CLONING IN PLASMID AND DOUBLE STRANDED Ml 3 BACTERIOPHAGE VECTORS  Restiction Endonuclease Digestion of Vector DNA Approximately 5 ug of vector DNA was digested with the 2-4 fold excess of the appropriate restriction endonuclease in 50 ul of 1 x restriction buffer at 37 °C. After 2 hours, additional 1-2 units of the restriction endonuclease were added and the digest was incubated further at 37 °C to ensure that the vector was cut to completion. The enzyme was inactivated by heating at 68 °C for 15 minutes. Dephosohorvlation of Vector DNA To prevent religation of the vector DNA, the 5' phosphate was removed by calf intestinal phosphatase (CIP). It was noted that CIP did not absolutely require CIP buffer for activity but can also function efficiently in all restriction buffers (C. H. Newton, personal communications); however, the relative efficiencies under the two different sets of conditions have not been systematically explored. In general, about 0.5 units of the enzyme were added directly into the restriction mix for each ug of DNA. The dephosphorylation reaction was conducted at 45-55 °C for 30 minutes (BMC data). The enzyme was then inactivated by adding trinitriloacetic acid to lOmM (aZn chelator) and heating at 68 °C for 15 minutes (Frishauf et. al., 1983). Undesirable salts and monophosphates, which can inhibit DNA ligase, were 106 removed by dialysis on Millipore filter discs over TE for 30 minutes (see Restriction Digests, P. 96) and the vector DNA was diluted to a final concentration of 0.1 mg/ml with TE and stored as a stock at-20 °C. DNA Ligation Bacteriophage Ml3 vector DNA was used at concentrations between 20-50 ng, while plasmid vectors were used at concentrations between 100-200 ng per ligation reaction. The efficiency of recovering recombinant clones was empirically determined by mixing various quantities of insert DNA to a constant amount of vector and ligating them in a 10 ul volume containing 1 ul of 10 x T4 ligase buffer (66 mM Tris-Cl pH 7.5. 5 mM MgCl2, 5 mM DTT) and a final concentration of 0.4 mM ATP. For intermolecular ligation involving "sticky-ends", 0.1 units of T4 DNA ligase was added and the reaction incubated at 12-16 °C overnight; if the ligation involved blunt-ends, 1 unit of T4 DNA ligase was then added instead and the reaction incubated at 4 °C overnight. A recently published method was also used to enhance the efficiency of both "sticky-end" and blunt-end intermolecular ligation (Hayashi et. al., 1986). The T4 DNA ligase buffer was made as a 5 x stock (0.33 M Tris-Cl pH 7.6, 33 mM MgCl2, 50 mM DTT, 0.5 mM ATP, 50% PEG-8000 and 0.75 M NaCl) and the reaction was carried out at 16 °C for 30 minutes to 4 hours with either 0.6 units ("sticky-ends") or 7 units (blunt-ends) T4 DNA ligase. The ligation products can be used to directly transform £. coll with satisfactory results without further manipulations. PREPARATION OF t3'-32p] tRNA  Synthesis of Cvtidine 3'. 5'-Diphosphate The labelling of tRNA molecules at the 3' end was conducted as described by Tanaka et. al. (1980) and modified according to Dr. D. L. Cribbs (unpublished). Cytidine 3-monophosphate (6,1 nm) was phosphorylated at the 5' end with approximately 3 pmoles of [y32p]-ATP and 2 units of T4 polynucleotide kinase in a 10 ul mixture containing 10 mM Tris-Cl pH 8.3. 10 mM MgCl2 and 10 mM DTT. The reaction was incubated at 37 °C for 60 minutes and then inactivated by heating in a boiling water bath for 1 minute. 107 RNA Ligase-Catalvzed Addition of [5'-32pi-PCp 1-2 ul of the product above, l5'-32P]-pCp, was used to radioiabel the 3' end of 1-2 ug of tRNA (purified tRNAs were a gift from Dr. I. C. Gillam and total 4S RNA was a gift from V. Dartnell) by using T4 RNA ligase in a 30 ul reaction volume containing 50 mM HEPES pH 8.3, 10% DMSO, 15% glycerol, 10 mM MgCi2,3 mM DTT and 5 mM ATP. The ligation reaction was conducted at 4 °C for 16-24 hours and terminated by adding 1 ml of 2 x SSC. DNA DOT BLOTS Nylon membranes were washed in distilled water and then rinsed in 1 M ammonium acetate. They were then placed on a platform consisting of dry paper towels on the bottom and moist Whatman 3MM paper on top. The assembly was covered in Saran wrap to prevent drying of the membranes. Plasmid DNA (both single- and double-stranded) was denatured in 0.3 to 0.4 M NaOH for 10 minutes at room temperature and then chilled on ice. Just prior to spotting onto the membrane, the DNA was diluted with an equal volume of cold 2 M ammonium acetate. The samples were taken up with a Pipetman and the DNA was delivered manually onto the membranes as small spots of 2 to 3 mm. The DNA spots were rinsed with drops of 1 M ammonium acetate and then the membranes were washed in 200 ml of 4 x SSC to remove dust particles followed by drying in air (Kafatos et. al., 1979). The DNA was then immobilized onto the nylon by UV irradiation for 3 minutes (Amersham). The conditions for prehybridization, hybridization and washes to remove excess probes were identical to those described for screening genomic libraries (P. 93). ORIENTATION OF tRNA* 7Ser GENE TRANSCRIPTION DNA fragments containing tRNA47^ e r genes were cut with two different restriction endonucleases and cloned into either Ml3 or pEMBL vectors. The two different restriction ends would permit cloning of the DNA fragment in only one orientation with the same compatible ends within the vectors. tRNA4,7Ser genes cloned into bacteriophage M13 were 108 transformed in £. coli strains JM101 or JM103 and the transformants were plated on YT plates. In the case with pEMBL vectors, the transformants were superinfected with the helper phage IR1 before plating the in vivo packaged virions. Either plaque lifts or DNA dot blots were prepared from the virions and probed with tRNA47^er-specific oligonucleotides GTg and GT7, which correspond to nucleotides 1 to M of the non-coding strand and nucleotides 40 to 58 of the coding strand, respectively. The direction in which the tRNA47 S e r genes were transcribed can be deduced from the hybridization results with the strand-specific oligonucleotides coupled with the orientation of the cloned insert in question (fig. 11). 109 EcoRI Xbal EcoRI M 111 r T 11111 i 11111111111 EcoRI Xbal EcoRI 5" EcoRI 5* — GCRGTCGTGGCCGfl + 1 +14 • —CGCTCCCRGRGGGRRTCTG +58 +40 GT7 3" 5 ' GTCTflflGGGflGflCCCTCGC CRGRTTCCCTCTGGGflGCG • GT6 3 ' 5 ' flGCCGGTGCTGRCG TCGGCCRCGflCTGC• + 10 + 1 ^ Xbal 3" Xbal 3-Fig. 11. Transcription orientation of tRNA genes. Restriction fragments containing tRNA genes are excised with two different enzymes (for example, EcoRI and Xbal), The fragment is then force-cloned in one orientation in vectors capable of producing singte stranded DNA (Ml3 and pEMBL). One strand of the DNA is extruded into the growth medium as virions. The purified DNA is used as templates in either sequencing reactions or dot blot hybridization using the two tRNA47^er gene-specific oligonucleotide primers GT$ and GT7. At the bottom of the figure are two possible orientations of a hypothetical tRNA gene. The possibilities can be differentiated by their hybridization behaviour with respect to the two oligonucleotides. HO CHAPTER I Characterization of the Entire t R N A S e f Gene Cluster at Polvtene Bands 12DE bv Chromosomal  Walking The two hybrid tRNA$ e r gene sequences, 474 in pDt73 and the 774 in pDtl6R, that were studied by Cribbs (1982) have been hypothesized to be products of gene conversion between the bona fide 444 and 777 genes. However, based on these limited results, the alternative possibility implicating standard reciprocal exchanges between the two bona fide gene types cannot as yet be dismissed. The two alternative possibilities can be distinguished by the fact that gene conversion involves unidirectional transfer of genetic information, and hence, the hypothetical and reciprocal hybrid sequences (747 and 447) would not be expected. On the other hand, reciprocal exchanges involve bidirectional transfer of genetic information and barring differential selection on the recombinants, the process should issue an equal number of reciprocal hybrid genes. As an initial step in determining whether gene conversion can be sustained as a viable hypothesis, the entire gene cluster at 12DE was characterized by a chromosomal walk. The results, along with those obtained for the autosomally linked tRNA47$ e r genes (D. A. R. Sinclair, unpublished observations), should provide the critical insight into whether these hybrid sequences are reciprocal products or not. The walk at bands 12DE was initiated by using plasmids pDt27R, pDt73, pDtl6R and pDtl7R as entry probes ("R" for these small plasmid clones designates reclones of single Hindlll Drosophila inserts as described in Dunn et al, 1979b). Each entire plasmid, or a purified equivalent restriction fragment, was radiolabelled by nick-translation and then used to screen genomic lambda or cosmid libraries by hybridization to filter-bound recombinant clones (See Methods and Materials). DNA isolated from such putative genomic clones were initially characterized at low resolution by mapping with hexanucleotide recognizing restriction endonucleases. Unique restriction fragments that were the furthest from the initial probes were purified and then were in turn used as radiolabelled probes to isolated adjacent DNA segments in both directions further along the {11 chromosome. The identification of an overlapping clone would thus represent a single step in the chromosomal walk. The walk was continued stepwise as such, and the final nested set of overlapping genomic clones should, when properly aligned with their restriction sites coincident, yield a composite map representative of a large chromosomal region. The validity of the chromosomal organization, as impressed by the ensemble of recombinant molecules, can be assured to a high degree of confidence by the consistency in the restriction maps in the overlapping segments and by genomic Southern blots. All I and cosmid recombinant molecules derived from the Oregon-R libraries are suffixed with "R" (not to be confused with the designation as in the above entry probes); otherwise, they are derived from the Canton-S library (Maniatis et al., 1978). One of the reasons for using recombinant libraries from both the Oregon-R and Canton-S Drosophila strains is that the genes residing in pDt73 (474) and pDtl6R (774 and 777) represent three of the four permutations of the tRNA4,7^er structural gene sequences thus far encountered at 12DE. I have compared these homologous genes between the two different fly strains to empirically ascertain the dynamics of genetic exchange at this chromosomal site. It was reasoned that if the elapsed time separating the different strains is sufficiently long and if the genetic exchange process is dynamic, then perhaps other permutational forms may exist at the homologous sites. The results showed that this gene cluster is composed of eight tRNA47 S e r and six tRNAArg genes. Despite repeated efforts, this chromosome site remained as four separated domains and has not been joined sucessfully as a single coherent region. One reason could be that this region is dense with repetitive sequences, which may not be easily maintained in most £. coll hosts used in library construction. None of the X-linked tRNA^ e r genes recovered showed the expected configurations anticipated from reciprocal exchanges and those homologous genes from Canton-S and Oregon-R remained identical in their structural sequences. Portion of the results have been contributing data published in Cribbs et al., (1987b). 1 1 2 RESULTS 1 (A). Chromosomal Walk in the pDt?3 Region At least 335 kb of contiguous sequence was isolated, almost all from the Canton-S X library (Maniatis et. a/, 1978). Each positive phage clone occurred at the expected frequency of one in 12,000 (or one per genome) before impasses in both directions were encountered (fig. 12). In an attempt to overcome the impass, the 2.0 kb EcoRI fragment was isolated from JC736 as a probe (coordinate 4.5 to 6.5) to screen a cosmid library. This library contained BamHl cut Oregon-R DNA cloned into the cosmid vector pJB8. Of 40,000 colonies screened, one isolate was obtained. However, this isolate cos40.1R overlaps with the Jl clones by "25 kb, and giving only 1.5 kb of new sequences extended to the left (coordinate 1 to 1.5). An attempt to establish further overlapping clones using the 0.5 kb BjtmHI+HjncII fragment (from coordinate 0 to 0.5) to screen both the Canton-S library and a newly constructed Oregon-R lambda library failed to reveal any positive phage. A single t R N A S e r gene in the molecular walk was subcloned from J1731 as a 4.2 kb EcoRI fragment into pUCl3 and sequenced with the gene specific primers GT6 and GT7. GT6 is identical in sequence to nucleotides +1 to +14 of the non-coding strand of the tRNA47^ e r genes (that is, corresponding to the tRNA sequence), while GT7 corresponds to nucleotides +40 to +58 of the coding strand (template). The primers yield complementary sequences within the structural genes (except the priming sites) in addition to both 3' and 5' flanking sequences, respectively. To confirm the sequence data, the 3 7 kb HindHI+EcoRI restriction fragment containing the tRNA gene was also subcloned into pEMBL8+ and sequenced using oligonucleotide F l . As in the original Oregon-R isolate in pDt73, the Canton-S gene is also a hybrid 474 gene based on the three diagnostic nucleotides (fig. 13). The gene is designated as pCS474. Comparison of 5-flanking sequence in the two different fly strains show only 2.8% divergence. Diffferences are predominantly single nucleotide substitutions or small deletions of one to two nucleotides. A poly-T putative termination signal occurs at 19 base pairs 3' to the structural gene in pCS474 but was removed in pDt73 (Cribbs, 1982) during cloning as the result of a fortuitous Hindlll site (beginning at nucleotide 250 in fig. 13) between the ! 1.3 Fig . 12. Molecular walk in the pDt73 domain. Approximately 33.5 kb was obtained before impasses were encountered. The coordinates of the walk are shown in the top line, measured in kilobases (kb). The dashed line shows the relative location of the entry probe, pDt73, in the walk. The single hybrid tRNA5*1 gene, 474, is depicted as an arrow head pointing in the direction of transcription. The restriction sites, BamHI (B), EcoRI (E), and Hind 111 (H) sites in the chromosome region are shown above the thick line. Below, the series of overlapping A. phages from the Canton-S and the cosmid clone from the Oregon R libraries are individually identified with numbers. The tick marks at the bottom represent distribution of the sites for Pstl. Hindi . Sstl and Xbal. The reiteration for these four restriction sites, beginning at coordinate 24. marks the left-most boundary of the Stellate sequences described in the text. 114 15 _1_ 20 I 30 _) _kb B H E I I I E H i i E H E B I I 733 Pstl II I I Hind i I I I I Sstl I X b a l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I H5 +50 t t t a ta t t t a gttateagtt etgsaattee aatttatatt ateagcttag +100 attttgcaca agatatggaa aatacttttt gttt t tgtaa attaatataa +150 tactcttaac t t tatattag tttcttaaat t t tattgata t t t t t t t t g c +200 gcatatatca agGCflGTCGT GGCCGRGIGG TTflflGGCGTC TGflCTflGRflfi +250 TCflGRTTCCC TCTGGGRGCG TRGGTTCGRfl TCCTRCCGfiC TGCGtttgta +300 agcttaattt g tat t t t tac aaacaaaaaa aaatactatt ataattatag tagcctcacc gcggaaattg tatatgtaag tgcatt F i g . 1 3 . DNA sequence of t R N A S e r 474 gene from Canton S is identical to its homologue i n Oregon-R. The structural sequence of the gene is shown in capital letters. The three diagnostic nucleotides at positions 16, 34 and 77 within the gene are underlined. The Hindlll site (AAGCTT), situated in the trailer sequence is at nucleotide 250 in the diagram (dotted underline). 116 structural gene and the poly-T sequence. Hence, no comparison of 3-flanking sequence diversity is possible. (B). Interspersed and Tandemlv Repeated Elements: For convenience, many of the original walking probes were simply prepared as EcoRI fragments. However, probing with some of these fragments almost always retrieved a confounding background of "false positive" clones bearing varying degrees of cross-homology with the original probes, suggesting that the pDt73 region contains many sequences that are present elsewhere in the genome. A subset of these sequences, particularly those in the immediate vicinity of the 474 gene, were usually short (<1.0 kb) but highly redundant or shared only a limited degree of homology with other sequences in the genome. When probes containing these repeated elements were radiolabelled and hybridized to Southern blots of restriction enzyme cleaved genomic DNA, autoradiography revealed only the bands corresponding to the unique sequence DNA fragments present in the probe. The repeated sequences present in the probe hybridized weakly throughout the genome and so contributed only a feeble background to the autoradiograph (for an example, see fig. 29). Repeated sequences fitting this description have been characterized by Pirrotta et al. (1983) and were termed "repetitious", as opposed to the longer "repetitive" elements sharing extensive and more conserved homology. In order to obtain unique sequences as walking probes, each recombinant clone was subjected to fine structure mapping using Pstl. HiacII, SstI and Xbal (fig. 12, bottom) and repetitious elements localized by "reverse" Southern (Pirrotta et a/,1983). In this procedure, the restriction enzyme cleaved recombinant molecules were resolved by gel-electrophoresis and the DNA fragments transferred onto a sheet of cellulose nitrate or nylon membrane. Total genomic DNA radiolabelled by nick-translation, serving as the probe, was hybridized to the filter-bound DNA; fragments containing repetitious elements gave a stronger signal than expected due to the additive contribution from many genomic sites (for an example, see fig. 60). From such mapping and hybridization studies, the walk can be partitioned into two zones based on the organizational 117 pattern of the repeats with an abrupt transition boundary established approximately at coordinate 24. The repeated sequences to the left of the boundary appear to be interspersed within the walk and elsewhere in the genome; whereas, to the right of the boundary, the repeated elements are tandemly arranged. The interspersed repeats are illustrated by two cases below, which show opposite reiteration patterns. The 1.5 kb EcoRI* BamHl fragment from coordinate 0 to 15 contains sequences that are dispersed throughout left of the boundary as shown by its intense hybridization encompassing the entire cos40.1R clone (fig. 14, panel C, lane 7). In contrast, the distribution of the 2.0 kb EcoRI fragment (coordinate 4.5 to 6.5) appears to be restricted to an S.l kb region in the cosmid clone (coordinate 14.5 to 23.6) (fig. 14, panel B, lane 7). Both fragments, in addition, also share limited homology with JL746 derived from a chromosomal walk in a separate region at 12DE (fig. 14, panels B and C, lanes 1-6). To the right of the boundary, the region is composed of 1.3 kb tandem repeats characterized by the recurrence of restriction sites for the enzymes Hindi . SstI and Xbal (fig. 12), but refractory to cutting by the more commonly employed enzymes EcoRI. Hindlll. BamHl or Pjtl. When one repeating unit was isolated as an Sstf fragment from J1735 and hybridized to genomic DNA cleaved with EcoRI. Hindlll and BamHl. predominately one intense band larger than 23 kb was detectable within an hour of exposure (data not presented). There are also some minor diffused bands of reduced intensity in approximately the 30, and in the 4.5 kb range in both the Hindlll and the BamjII digests. When the genomic Southern hybridization was repeated using DNA from the D. melanogaster mutant Df(l)glfB/In(l)AM, heterozygously deficient for the chromosomal bands 11F10-12F1, the hybridization intensity of the large molecular weight band was decreased by 50% (fig. 15) Thus, a large fraction of these repeats appears to exist as a large contiguous cluster on the X-chromosome. This mapping experiment also shows that the smaller 30 and 4.5 kb hybridization bands are probably Y chromosome-specific, since they are only evident in the male DNA Hn(l)AM/Y]. When the same SstI repeat unit was used to probe genomic DNA from D, erecta, D. teissieri and D. yakuba, three distantly related sibling species of melanogaster (see phylogenetic tree, fig. 118 F i g . 14. Interspersed repeated sequences shared between pDt73 and pDtl7R domains. Panel A is restriction digests of V746, derived from the pDt!7R molecular walk. (Lane 1): Hindlll; (lane 2): Hind 111 + BamHI: (lane 3): icoRl: (lane 4): EcoRI + BamHI: (lane 5): BamHI: (lane 6): Hindlll • EcoRI. (Lane 7) is cos40.1R cut with BamHI + EcoRI. Some faint bands present in the digests are probably the result of minor cross-contamination by 7.1722 DNA (a clone overlapping with 31746, see fig. 16) during loading. Panel B shows the hybridzation pattern of the 2.7 kb EcoRI fragment isolated from the pDt73 chromosome walk at coordinates 1.5 to 4.2. This fragment is repeated at least once internally in the same domain, as shown by its hybridization to the 3.9 kb EcoRI fragment at coordinate 1S.6 to 22.5 (lane 7), but it also shows very strong homology to a region in the pDtl7R walk mapped to coordinate 25 (see fig. 16). Panel C shows the hybridization pattern of the 1.5 kb BamHI+EcoRl fragment from coordinates 0 to 1.5 in the pDt73 chromosome walk. This fragment has a hybridization pattern just the reverse of the probe described in panel B. It contains sequences that are reiterated many times within in the pDt73 domain (lane 7), but only show limited homology with sequences in the pDtl7R walk. The localization of these sequences in the pDt!7R walk has not been precisely determinined but there are at least two copies, one of which occurs near the 777 gene in 31746 (4.0 kb EcoRI band in lane 3) and the other, with much stronger homology, is located in a 3-5 kb EcoRI fragment >20 kb away downstream from the 777 gene (lane 3). This smaller band was later shown not to be derived from A.746, but from the next overlapping phage 311722. present as cross-contaminant in the gel lane during loading. 119 JQ O 06 1 1 * IS. CN* 1 CN Q CN I 120 Fig . 15. Hybridization of a 1.3 kb SstI fragment corresponding to one repeat unit of the Stellate sequences to f ly strains deficient for polytene bands 12DE. Top panel: lane 1 is DNA from ln(l)AM/Y (male) Inote that ln(l)AM is a rearranged ^-chromosome but has no deletion); lane 2 is DNA from Df (1 )g'fB/ln( 1) AM (female heterozyous for the deletion at I2DI-12FJ) (see fig. 52 for cytology map); lane 3 is DNA from In(l)AM/In(l)AM (female with no deletion). A l l genomic DNAs were cut with Hindlll prior to resolution by gel electrophoresis. The size markers are Hind 111-generated X DNA, as shown in the left-most gel lane (kb-kilobases). After gel electrophoresis, the DNAs were transferred to a sheet of Hybond nylon membrane and hybridized to a nick-translated SstI probe, corresponding to one unit of the tandem repeats. After removal of excess probe by washing (see Methods and Materials), the bands were detected by autoradiography for approximately 5 hours at -70 °C and was then reprobed with nick-translated pDt5 as an internal control to monitor the total amount of DNA loaded in each lane. This second probe contains a 4.2 kb Hindlll segment of Drosophila DNA derived from polytene bands 23E on chromosome 2L and should not be affected by the deletion (Cribbs et. at., 1987b). After removal of excess probe as above, the membrane was exposed for 5 hr, 10 hrs and 15 hours. The figure shown here is from the to hr exposure. The two bands labelled as "A" and "B" are the main Stellate band and pDt5, respectively. Middle panel shows the quantitation of hybridization bands in the Df(l)g'fB/In(l)AM mutant as determined by scanning with a Bio-Rad Video Densitometer (Model 620). Bottom panel is a similar scan as described above, except the mutant genotype is In(l)AM/In(l)AM. The shaded areas indicated in the densitometer tracings were cut out and the areas compared by weighing (the actual peaks used in the weighing were about 4-fold the areas as shown in the figure). The results showed that approximately 50% of the hybridization intensity in band A was removed by the deletion at the 12D t -12Fj region, retative to the internal control pDt5. Since the response of the X-ray film may not be in the linear range, it is not certain whether the entire cluster of Stellate sequences has been removed on one homologue in the mutant Df (1 )g'fB/ln( 1) AM, but it does suggest that a large proportion of the Stellate cluster overlaps with the deleted region. In lane 1 containing male DNA, unique band clusters are observed at positions corresponding to "3.0 kb and 4.5 kb to 5-0 kb, which are almost certainly Y-chromosome specific regulatory sequences as discussed in Livak (1984). 122 28), there was total absence of hybridization (data not shown). The simplest explanation would be that these repeats have been acquired only recently in the melanogaster group. The above observations bear some resemblance to those obtained by Livak (1984) with a similarly cloned sequence known as Stellate thai is tandemly reiterated 200 times at polytene bands 12F. Due to the difference in the choice of restriction enzymes in the present mapping studies, it was difficult to determine if the sequences were identical to Stellate. However, both sets of sequences share a 1.3 kb repeated pattern of Hindi sites and also both show the conspicuous absence of restriction sites for the enzymes EcoRI. Hindlll, BamjU and Pstl As well, the sequences in his studies are also species-specific, confining to the melanogaster group. Recently, a clone, pSX1.3, containing one unit of the Xbal repeat was obtained from Dr. K. Livak. Hybridization experiments showed that it is homologous to the SstI fragment of X735 (data not shown). Hence, it appears that the X-chromosome segment 12E to 12F in D mehwogasteris occupied by a large block of Stellate sequences. 2. (A) Chromosomal Walk in the pDtlTR Region pDtl7R was originally cloned as a 10 kb Hindlll fragment propagated in the recBC £ coll hostSF8 as reported by Dunn et al. (1979b). Subsequent culturing of the plasmid in another hostC600 (rec*) yielded a deleted variant containing only a 4.7 kb Hindlll insert reported by Cribbs (1982). It is possible that the original insert may have contained inverted repeats that were recognized and cleaved by the functional recHC* restriction system in the latter £ coll strain (Boissy and Astell, 1984; Leach and Stahl, 1983; Nader et al., 1985). Nevertheless, using the deleted variant as a starting point, overlapping phages were initially obtained from the Canton-S library. Three lambda clones U71, 71746 and 311722 were obtained but at a low frequency of approximately 1 in 20 to 40 Drosophila genome equivalents of phages before impasses were met (fig. 16). Many of the sequences contained within X746 (see fig. 14 a and b) and particularly end fragments in M722 at coordinate 40 to 45 are repeated and share homology with cos40.1R in the pDt73 region discussed above. No more authentic overlapping clones were obtained from the Canton-S library despite exhaustive screening. Fig. 16. Chromosomal walk in the pDtlTR domain. At least 45 kb of DNA was collected from two different X libraries before impasses were encountered. The coordinate line in kilobases is indicated at the top of the diagram. The residual DNA segments in the deleted version of pDtl7R is shown as dashed lines "a" and "b", which have been fused during the deletion events. Hence, there were at least two independent deletion events, between coordinates -12 to -16 and between coordinates -17 to -17.5, that consequently led to the deleted version. The restriction sites for the chromosomal domain are indicated above the thick line with the restriction sites B-BamHI; E-EcoRI; H-Hindlll; S-Sstl: X-Xbal. Three tRNA S e r genes, 444*. 774, and 777 are found in this region, and their directions of transcription are indicated by small arrow heads. In addition, one tRNA^gene is associated with a BamHl site, approximately 200 bp upstream from the 444* gene. The direction of its transcription is shown by the wide stemmed arrow. The 31 phages representative of the chromosomal region are shown at the bottom as overlapping lines and are denoted individually by numbers. A i l coding regions analyzed were derived from A. 171. I2*f | A i g i 2 . 5 F E H H H X B E X E E E H E X S H B E BE E E E E BE H 171 125 The phage M730R was obtained from an unampiified EcoRI partial library cloned into JIEMBL4. The other phage, U731R, was obtained from a Mbol partial library cloned into the BamHl site of 3LEMBL3. Further screening of the XEMBL3 library using the EcoRI»Sail fragment from coordinate 0 to 1,5 as a probe cross-reacted with 17 different putative phages that are unlikely to be overlapping clones based on limited restriction analyses and Southern hybridization. However, some or all of them could represent rearranged variants of different extents from authentic overlapping clones as a result of unstable sequences; although they were not analyzed further. (B) Localization of tRNA S e r Genes Within the Walk Of the three overlapping phages (M731R, JU71 and M722) representing the entire 44,5 kb, only 31171 contained tRNA^ e r genes clustered within an expected 10 kb Hindlll fragment (see summary fig, 51, lane 8). Digests with either EcoRI or EcoRI and Hindlll in combination showed three bands of hybridization, suggesting that at least three genes are contained within this phage (data not presented). The deleted fragments within pDtl7R were mapped by comparing restriction sites within the walk or plasmid subclones of relevant regions and further confirmed by hybridization studies. The results are summarized in the chromosomal walk depicted in fig. 16, Of the two similar sized residual fragments ("a" and "b") that escaped the deletion events, only fragment "b" hybridizes to both pDtl7R and tRNA47S e r gene-specific probes (using both plasmids and gene-specific oligonucleotides). This is assumed to represent the remnant of pDtl7R that contains the original tRNAySer gene. The other gene-containing EcoRI fragments of 4.6 kb and 1.8 kb that have been deleted in pDtl7R were subcloned into pUCl3 and designated as pE4.6 and pE1.8, respectively. Both plasmids were digested with Haelll, Ddel and Hpall and probed with oligonucleotide GT7. The Southern hybridization results were consistent with each plasmid containing only a single tRNA S e r gene. The tRNA genes within the subclones were mapped by an improved strategy, Oligonucleotide Indirect Labelling method (OIL) as described (consult Methods and Materials, P, 102), pE4.6 was cleaved to completion with Pvull and JBstI in combination to release the 126 F i g . 1 7 . Restriction mapping of p£4.6 by the oligonucleotide indirect labell ing method. The Drosophila DNA was first released by complete digestion with Pvull and £§il. The mixture of fragments were then partially cleaved with "mapping" enzymes Hinfl. Taql and Hpal I. A l l of these enzymes are known to cleave within the structural tRNA^^er g e n C i except the Hoall. the other two also cut within the structural tRNA7^ e r gene as well. The partially digested products from various time points (3' to 30') were then resolved by electrophoresis in a 1.5% agarose gel. Panel A shows a typical result after gel electrophoresis, where a complex pattern of bands comprising of both Drosophila and vector DNA is seen. The DNA was then transferred onto a sheet of Hybond nylon filter and probed with F l , the universal sequencing primer. Panel B shows the hybridization pattern with the sequencing probe, which would only anneal specifically to and indirectly label the Drosophila specific DNA fragments close to the Pvull site. Panel C shows restriction map of this 4.6 kb fragment reconstructed from the above hybridization pattern in (B). RI-EcoRI are the actual cloning sites flanking the Drosophila insert. The small dotted line to the bottom left of the restriction map indicates the Fl hybridization site. Hinfl ( • ) , Taql ( • ) , and Hoall ( A ) sites are shown above the map. The small arrow head indicates the site of the tRNA^ e r gene, as revealed by almost overlapping sites for the mapping enzymes, and points in the direction of transcription from 5' to 3'. Size standards used were generated from Hindlll cut X and Hinfl cut pBR322 DNAs. The sizes corresponding to these fragments are shown on the right of panel B. 128 Drosophila insert. After inactivation of the enzymes by heating at 68 °C for 15 minutes and the salts removed by micro-dialysis over TE, the digest was divided into three aliquots. Restriction buffers were appropriately adjusted and then 1-2 units of "mapping" enzymes Hinfl, Hpall and Taql. which cleave within the tRNA4,7 S e r structural gene, were added to the each of the three respective reactions. At various time increments, small aliquots were removed and the restriction endonuclease inactivated by adding EDTA to a final concentration of 50 mM. The DNA fragments were resolved by agarose gel-electrophoresis, transferred onto a sheet of Hybond nylon membrane for about 3 hours (Wahl etal., 1979; Meinkoth and Wahl, 1985) and then probed with 32p-radiolabelled universal sequencing forward primer, F l . The sizes of the hybridization bands would thus reveal the spatial distribution of the restriction sites in question, relative to one end point of the restriction fragment indirectly labelled with F l . The results of such a mapping experiment are displayed in fig. 17. A putative tRNA$ e r gene is localized at approximately 0.5 kb from the EcoRI site (or 0.6 kb from the Pvull end) based on the diagnostic Hinfl. Taql and Hpall cleavage patterns. Similarly, pE1.8 was digested to completion with Pvul and PstI 5' and 3' to the cloned insert, respectively. The released insert was partially digested with Hinfl, Hpall and Taql, transferred onto nylon membrane and hybridized with Fl as described above. The putative tRNASer gene is estimated to be "i .0 kb from the EcoRI site (or 1.12 kb from the Pvul end) and it is displayed in fig. 18 (bottom). Both of the above gene-localization studies have also been independently corroborated by employing the well-established, but more cumbersome, Smith and Birnstiel method (Smith and Birnstiel, 1976). The 1.5 kb and 0,9 kb EcjjRI+Ibjtl fragments from pE4.6 and pE1.8 respectively, predicted to contain tRNASer genes were purified from agarose gels. They were then end-labelled at either the EcoRI or Xbal site with [c<32p]-dATP and [ct32p]-dCTP, respectively, with the Klenow enzyme. The HaelH. Hinfl. Taql and Hpall restriction endonuclease sites within the fragments were then mapped by partial digestion under conditions as described in Methods and Materials (P. 101). The localization of the tRNA$ e r genes by this latter method entirely agrees with predictions based on the OIL data obtained 129 Fig . 18. Localization of the t R N A ^ e r gene i n pEl,8 by the oligonucleotide indirect labell ing method. The 1.8 kb EcoRI Drosophila insert was first released by cutting with Pvul at "80 bp 5' to the sequencing primer site and at the Esll site 3' to the insert within the polylinker cloning site. The mixture of DNA fragments were partially digested with "mapping" enzymes Hinfl. Taql and Hoall. exploiting the fact that all three sites occur within the coding sequence of tRNA4^ e r gene (except with the Hoall site which does not occur in tRNA7^ e r gene). Aliquots were removed at various time points as indicated at the top of the autoradiography (from 3' to 30) and the products resolved by gel electrophoresis in a 21 agarose gel. The DNA fragments were transferred onto a sheet of Hybond nylon sheet and the 5' end of the Drosophila insert was indirectly and specifically labelled by the universal sequencing primer. The location of the tRNA^ e r gene within the EcoRI insert is shown as a small arrow head at the left edge of the autoradiography. At the bottom is the restriction map constructed based on the sizes of the partial products. The ends of the Drosophila insert are indicated by the EcoRI cloning sites (RI). Hinfl ( • ) , Taql (n) , and Hoall ( A ) sites are indicated. The first Hinfl and Taql sites (closest to the priming site) are located within the polylinker and are omitted from the restriction map. The overlapping Hjn.fi, Taql. and HMII sites representing the gene is indicated by the arrow head below the restriction map, pointing in the direction of transcription. The dotted line to the left below the map indicates the primer site. Markers are Mindlll cut l and Hinfl cut pBR322. Their sizes are shown on the right of the top panel. 130 RI i . RI 1 1 1 i i I I . . . . I 200 bp 131 above (data not presented). (C) Sequence Analyses of pE4,6 and pE1.8 Small aliquots of the two EcoRI* Xbal fragments used in the Smith and Birnstiel mapping studies described above were also cloned into the bacteriophage vector Ml3mpl8. Since the fragments contain two different restriction ends, they can be inserted in a predictable orientation within the polylinker sequence, This experiment served two purposes: first, the smaller size of the insert should reduce the probability of deletions during the propagation of templates in the £ co/Arec*) strains JM101 or JM103 for DNA sequence determination, since the DNA segments were proven unstable previously when propagated in C600 (Cribbs, 1982); second, the direction of the tRNA gene transcription can be deduced by hybridization with strand-specific oligonucleotide probes GTg and GT7 homologous to different parts of the tRNA^jSer genes (see Methods and Materials, P. 107), and thus provides further supporting evidence to the mapping studies. The 1.5 kb EcoRI*Xbal fragment of pE4.6 can only be primed in dot blot hybridization and DNA sequencing with GT7, while GT6 gave only weak smears resulting from non-specific hybridization in both cases. Both sets of results here, along with the mapping studies above, provided conclusive evidence indicating the direction of transcription of the tRNA gene is from the EcoRI site towards the Xbal site as depicted in the molecular walk in fig. 16. The initial DNA sequence data obtained above also revealed a Rsal site 28 bp 5' to the gene and this enzyme was used to generate a 200 bp fragment cloned in pEMBL8-. Single-stranded template was prepared from the plasmid by superinfection of IR1 and sequenced using the reverse primer, RI. Subsequent confirmatory data were obtained by the supercoil sequencing method using both gene-specific primers. The sequence shows a tRNA4^er-type gene based solely on the three distinguishing nucleotides. However, the gene sequence deviates slightly from that expected by a C-T transition at the tip of the extra arm at nucleotide 213 (designated as 444* gene) as shown in fig. 19. Also, a fortuitous box B promoter sequence, CGAAT, at position 229 is repeated three times at the 3' end of the structural gene (starting at 132 position 245). Whether this duplicated promoter sequence has any influence on transcription rate is not known. Within this EcoRI*Xbal fragment, there is also aBamH I site corresponding to a tRNAArg gene with the restriction site constituting part of the coding sequence immediately 3' to the anticodon (coordinate 15 in fig. 16). This gene, designated as pArgl2.5. was characterized by C. Newton (personal communications) and will not be reported here. Similarly, in both dot blot hybridization and DNA sequencing, the 900 bp EcoRI*Xbal fragment from pE1.8 can only be primed with GT7. This again indicates the direction of transcription is from the EcoRI site towards the Xbal site. In this case however, it is in the opposite orientation relative to the 444* gene. An Mbol site conveniently located 25 bp 5' to the gene was identified from the initial sequencing and was used to generate small Mbol inserts cloned into pEMBL8-. Positive clones were isolated and the plasmids were converted into single-stranded templates and sequenced using primer RI. This sequence was confirmed by the supercoil sequencing method described by Chen and Seeburg (1985) using the original pE1.8 as the template and the two gene-specific primers. The sequence shows a hybrid 774 gene and is designated as pl7-774 (fig. 20). 3. (A) Chromosomal Walk in the pDt27R Region The initial screens of the Canton-S and two different Oregon-R 3LEMBL3 libraries (which contained Mbol partially sheared DNA from either adult flies or tissue culture cells) with pDt27R were unsuccessful despite over 30 genome equivalents of phages being screened from each. However, one positive X272R was eventually obtained from an unamplified EcoRI library cloned into JLEMBL4 (fig. 21). "Walking" probes prepared from this phage again failed to obtain any positives from any of the aforementioned libraries. 420R is a plasmid clone containing a polymorphic 17 kb Hindlll fragment isolated from the D. melanogaster strain 420 (a gift from C. Newton), which extended the walk to the right for another 10 kb. CosP273R contains a 42 kb Mbol insert extending to the left for another 30 kb. It was obtained after screening more than 20 genome equivalents of cosPneo clones. Since sequences from this +50 agtatgttaa t c c t t t t a t t atccUcaat ggatatttca atattggcaa + 100 taattattgt agcatcattt gatagttaca aattatglaa attttagcga Rsal +150 cagtggaaaa gtgaaagtgg ctcgactttc aagtacgtaa tttgacacca +200 gctataacaa gaaGCflGTCG TGGCCGflGIG GTTflflGGCGT CTGRCTCGRR +250 RTCRGRTTCC CTTTGGGRGC GTRGGTTCGR RJCCTRCCGG CTGCGgatcg +300 aatcgaattt t t tacact tc gcatagagct accatatttt ttatgtgcgc ctcaattaaa cttgatgaca aattaaagtc cgtcagtggg F i g . 1 9 . Nucleotide sequence of the 444* gene i n p£4.6 from the Oregon-R strain. The Rsal site (GTAC) at nucleotide 143 is indicated above the sequence, which has been exploited to generate further subclones. The structural gene is depicted in capital letters. The T mutation at nucleotide 213 is actually a C in the tRNA, corresponding to position 50 in the tRNA structural gene (non-standard nomenclature). The Box B promoter sequence, CGAAT, at position 228 within the gene is also repeated three times starting at nucleotide 244 (dotted underline). 134 t l caatat ta atgaaaaatc tgaaaaaatt aaccgagtca cgactttaaa +100 tcacttgaat taatcgaatg aatgaactgc gattttggtc tataaattga Mbol +150 acgtgtggaa gggggcacag aaaaatttct ggatctggat ggcaaatgtc +200 ttcgccaaGC flGTCGTGGCC GRG£GGTTRH GGCGTCTGRC TflGflflRTCfiG +250 RTTCCCTCTG GGRGCGTRGG TTCGfiflTCCT flCCGGCTGCG tttaatgcta +300 taattttagc ttaatttaga tacttacact gagaaaaaaa accgcaatga +350 tgcaatatca tttaaaaata aataaaacag aaagtaatta a t t t t t t caa ccaaatcaga ctaatcttag t F i g . 20. Nucleotide sequence of the 774 gene in pEl ,8 from Oregon R. The Mbol site (CATC) used in subcloning is shown above the sequence. The structural gene is in capital letters and the three diagnostic nucleotides are underlined. vis region appear to be rare and contain many repeats (see summary, fig. 51 lane 6), no more attempts were made to extend the walk further. (B). Localization of tRNA Genes Within the Walk Almost the entire Drosophila insert contained within pDt27R was sequenced by C. Newton (1984). It has been shown to contain two identical tRNA4$e r genes clustered near one end of the insert. From alignment of restriction sites and Southern blotting, the two tRNA^ e r genes in the corresponding chromosomal walk have been localized to the HindlH+BamHI fragment approximately at coordinate 40 (fig. 21). Since the equivalent genes have not been sequenced, they can only be assumed to be tRNA4$er genes. From his sequence analysis, Newton also showed the cluster of four BamHI sites approximately 600 bp downstream from the tRNA4$er genes. As discussed in the pDtl7R chromosome walk, these also correspond to four duplicated tRNA A r S genes (designated as pArgl2.1 to pArgl2.4 in fig. 21 and Newton, 1984). The similar arrangements of BamHI sites within the walk is assumed to reflect a coincidental arrangement of the four tRNAArg genes. From Southern analysis of cosP273R, an additional tRNA A r6 gene (pArgl2.6) has been localized within a 360 bp Hindlll fragment at coordinate 28 (fig. 21), This fragment was subcloned into p£MBL8- and sequenced by the modified double-stranded method described by Hattori and Sakaki (1986). The sequence is displayed in fig. 22. The structural sequence is identical to that in pArgl25 in the pDtl7R walk, but both differ from the cluster of four duplicated tRNAArg genes at coordinate 40 by aC-T transition at position 13 within the A block promoter region. However, the recent cloning and sequencing of the entire family of tRNAAr8-related genes showed that the C13 nucleotide is actually the exception, peculiar only to those duplicated genes within pDt27R (C. Hunter Newton, manuscript in preparation). 4. (A) Chromosomal Walk in the pDt!6R Region Only three 31 phages were isolated from the Canton-S library before impasses were met. 311161 occurred at the expected frequency of one per genome equivalent of phages, while 311162 and Xi163 occurred at frequencies of one per 40 genome equivalents of phages (fig, 23) 136 F i g . 21. Chromosomal walk i n the pDt27R domain. More than 56 kiiobases of genomic DNA from this region were collected before impasses were met. The coordinate line in kb is shown at the top of the figure. The interruption in the map denoted by two slashes to the left indicate unmapped DNA, but it is known to be composed of largely repeated DNA (see fig. 60, lane 6) and no detectable hybridization using total 4S RNA (a gift from V. Dartnell). The entry probe for initiating the walk is indicated by the dashed line. The chromosome region is shown as a thick line with their restriction sites, B-BamHI; E-EcoRI, H-Hindlll. above. The two t R N A ^ genes are indicated by small arrow heads pointing in the direction of transcription. Five t R N A ^ genes have been identified (wide stemmed arrows). Four duplicated copies of a t R N A ^ gene are associated with the cluster of four BamHI sites 600-bp downstream from the t R N A ^ 1 -genes. These genes are expanded below the chromosome map and the relative sizes of the duplicated units are delimited by ticks. At the DNA sequence level, these boundaries are composed of a short direct repeat TAGCCCAA (see fig. 55). The Arg 12.1 and Arg 12.2 genes are 600 bp in size, while Argl2.3 and Arg 12.4 are 200 bp with a single large deletion in the 5-flank relative to the larger units. Another solo copy is associated with a BamHI site at coordinate 28 to the left, which is transcribed in the opposite direction. The duplicated tRNAArg genes are distinct from the solo copy and the rest of the gene family at nucleotide 13 as described in the text. At the bottom are the overlapping recombinant clones, none of which was present in the Maniatis library (Maniatis et at, 1982). The inverted triangle in the clone 420 indicates a frequent Hindlll restriction polymorphism detectable in numerous commonly used lab Oregon-R stocks, including those used in deficiency localization of the tRNA genes (see fig. 52). 137 444-1 444-2 i Arg 12.6 B B HBH E BH BBB B E H E EE BH 1 ' • ' ' " . • 1 1 1 • • - • Arg Arg Arg Arg 12.1 12 2 12.3 12.4 » » » • — 1 1—I I . C O S P 2 7 3 R 138 +50 aagcttcgtt tcgcgttgaa actgaatttt ttgcaattca acccttccca +100 cttattatag ttttcgttct gttctcacta gcaaatgttc tcactccagt + 150 ttctctcgcc tctccctctt tatatttgtt gttacggcct ggtaatccaa +200 ctGHCCGTGT GGCCIflflTGG flTRflGGCGTC GGflCTTCGGR TCCGflflGRTT +250 GCflGGTTCGfl GTCCTGTCflC GGTCGaccgc tctatctttt ttttaatatt +300 catattttcc ttgagctatg aatattacag cttttattaa ttggccaagt caattgctgc F i g . ZZ. Sequence of pArglZ.6 of D, melanogaster. The fragment was cloned as a 360 bp Hindlll fragment and sequenced with universal primers Ft, RI and also the two gene-specific oligonucleotides Arg5' and Arg3'. The structural gene is depicted in capital letters and the T 1 3 (at nucleotide 165) is emphasized by underlining. The characteristic BamHI site (GGATCC), constituting part of the coding sequence 3' to the anticodon, begins at nucleotide 18S. 139 Using pDtl6R again as the probe, X2161R was isolated from the Mbol XEMBL3 library after screening about 20 genome equivalent of phages. No more authentic clones were obtained from any of the JL or cosmid libraries using various probes spanning the entire Hi 161. As predicted from the pDtl6R (Oregon-R strain) restriction map (St. Louis, 1985), each t R N A S e r gene is located on Hhal fragments of identical 980 bp in length. This was also confirmed by digesting U161 from the Canton-S strain with Hhal and Southern blotting using GT7 as the probe. To determine whether other permutational forms of tRNA4 (7^er genes exist at the equivalent site in this fly strain, the Hhal fragments were isolated from agarose gel and their ends were trimmed with SI nuclease, then ligated into the Smal site of pUC13 Sequencing results showed that one corresponds to a 777 gene (fig. 24) while the other corresponds to a 774 gene (fig. 25), identical to those originally identified in pDtl6R. Comparison of corresponding flanking sequences between the Canton-S and Oregon-R fly strains showed the expected 3-4% sequence divergence. Again, they consist mostly of single nucleotide substitutions and insertion or deletion of one to two base pairs. The Hhal fragments between the two strains were presumed to be identical based solely on the criterion of size. In fact, they are not. From restriction mapping (St. Louis, 1985) and sequence analysis of pDtl6R (Cribbs etal, 1987b), it was predicted that the Hhal site should be 70 bp downstream from the 777 gene. In the Canton-S sequence, however, the expected recognition site GCGC has been altered by an insertion of an extra nucleotide to give GCTGC (fig. 24). A new Hhal site is probably created by a nucleotide substitution 154 bp downstream in the sequence GTGC to GCGC in the Canton-S strain. This is inferred from the other cloned Hhal fragment containing the 774 gene which shows the sequence adjoining the Smal cloning site to be AAACCAATTT (nucleotides 1-10 in fig. 25). This block of nucleotides occur three base pairs downstream from the hypothetical Hhal site. It was probably removed along with the proceeding three base pairs by the SI or other contaminating nucleases during the trimming step. Since both tRNASer genes in Canton-S were also localized to two Hhal fragments of identical size, there would necessary have been two other fortuitous and compensating changes which created new Hhal sites to maintain the parity in size for the two 140 F i g . 23. Chromosomal walk i n the pDtl6R domain. At least 22 kilohases of genomic DNA were collected. As in the previous cases, impasses were met repeatedly in all genomic libraries and no further attempts were made to extend the walk. The coordinate line, measured in kb, is indicated at the top of the figure. Dashed line represents pDtl6R, the entry probe used to initiate the walk. The two tRNA S e r genes, 774 and 777, are shown as small arrow heads pointing in their direction of transcription. The chromosome region is displayed as thick line with the restriction sites BamHI (B), EcoRI (E), Sail (SI) and Hindlll (H) marked above the chromosome region. The polymorphic sites E and H are indicated as inverted triangles in the recombinant phages 2161R and 1162, respectively. 141 10 15 20 774 777 E BH E BH H SI B E 1 1 1 1 1 1 1 • i i 2161R 1161 1163 - 1162 142 +50 gctaccactt ggcgtaataa aatcaaatta gtggaaacag aaaatatttc +100 gagtttatga agataaaaaa attcattgaa caaacgtcaa ctattttcac + 150 cttcatagcc attatcatcg accactcatt gcttactcag ctttttatgc +200 ctatatctta caatagacgc cccgatcctc aaaagcgatc caatcttctt +250 ttcatgccaa cttgacgatc cgcgatcatt aaGCRGTCGT GGCCGRGCGG +300 TTRRGGCGTC TGRCTRGRRR TCRGRTTCCC TCTGGGRGCG TRGGTTCGRfl +350 TCCTRCCGRC TGCGaatagt aatctgtttt ttggaagtcc agaaaataga +100 tcgacagaag atcagaaaaa gtattaagaa gctgctctct tataatgctt +450 aaaaaatatt tcgtagtaaa agagtgaagt gtgtggcaaa taaaatcatg cacctttgta aagttactga tat F i g 24. DNA sequence of the pCS 16-777 gene from Canton S. Sequencing strategy is described in the text. The structural gene is depicted in capital letters and the three diagnostic nucleotides are underlined. The mutated Hhal site, (CCCC-GCTGC), is located at position 381, 67 nucleotides downstream from the 3-end of the structural gene (dotted underline). 143 +50 a n a c c a a U t a a c t t t t t t g o G t t t a a t c a t t a t c t a t t g U o a G o o a g t +100 gatattaata g t t a t a c g a t c g a c t t t t c g c t a t a a a a a g a t c a g t g a t a +150 ttaatgtagc t agag tcggg taataaagcc tctggagtca tcaaaGCRGT +200 CGTGGCCGAG CGGTTRRGGC GTCTGRCTRG RRRTCRGRTT CCCTCTGGGfl +250 GCGTRGGTTC GRRTCCTRCC GGCTGCGgtt tataagtgcc aattttttta +300 aaataattaa gccaaactaa taaattcaaa aggtaacatc attaggaata +350 tatataaaac acaatttttt agtattaaat tagttataca atagtttttt tgcaatcctt gtgttatgca atctgtaag F i g . 25. DNA sequence of pCS16-774 gene from Canton S. The 5-end of the insert is near the polylinker cloning site. The sequence corresponding to the mature tRNA is presented in capital letters. The three diagnostic nucleotides are underlined. The proposed new Hhal site is probably three nucleotides upstream from the 5'-end of the insert, where the sequence GTGC in Oregon-R could have been mutated to GCGC in the Canton-S strain. 144 fragments. In summary, a total of eight tRNA$ e r and six tRNA A r8 genes have been uncovered at 12DE (see Table V in Discussion). Two of the tRNA$ e r genes, mapping to the pDtl7R domain, have not been described previously. Neither of these two genes (a 444* and a 774 gene) c o r r e s p o n d to the reciprocal hybrid sequences anticipated from regular exchanges. Thus, assuming no differential selection operated on the different recombinant products, the observations remain consistent with the coversion model postulated by Cribbs (1982) and Cribbs et. al., (1987b) (also see Discussion concerning the autosomally-linked tRNA4$e r genes). Unfortunately, the equivalent genes isolated from Canton-S, in the hopes of imparting new insight into the dynamics of genetic exchange, only managed to arouse pedestrian curiosity since they are identical to their Oregon-R counterparts. It is not known whether the turn over of this gene cluster is particularly slow since the rates of turning over for other multigene families have only been reported from comparisons between the Drosophila sibling species, rather than between strains (reviewed by Dover, 1982). This idea will be further explored in Chapter II (part II), where the pDt73, pDt27R and pDtl6R homologous regions from different species will be compared. The overall organization of the two gene families appears to be typical of all known Drosophila tRNA gene clusters (for example, see fig. 3). The 12DE site reported here is at least 157 kb in size plus the intervening DNA between the four domains which was not successfully retrieved. The most plausible reason for the failure to link the domains could be the high density of repeated sequences encountered in the walk may be poorly tolerated in most of the c o m m o n F. coli hosts used in propagating recombinant DNA. Nevertheless, it is almost certain that all of the tRNA S e r genes have been cloned from this site (Cribbs et ai, 1987b; see Discussion) and that the 474 gene is closest to the centromere based on its proximity to the Stellate sequences. 145 CHAPTER II Flanking Sequence Relatedness in the tRNA4 7 $ e r Genes in ft melanogaster And Sibling  Species The repertoire of hybrid genes retrieved from 12DE is entirely consistent with the hypothesis of nonreciprocal recombination (or specifically gene conversion), but does not constitute definitive proof. Thus, other hallmarks have been sought in the tRNA47S e r gene cluster to provide independent evidence to either support or confute gene conversion as a viable hypothesis. In Part 1 of this chapter, the flanking regions of all the tRNA47 S e r genes at 12DE were compared systematically. Since conversion is known to occur predominately via intrachromosomal transmission of genetic information (see Discussion), it was expected that the hybrid genes would retain sequence signatures in their flanking regions tracing to the original parental genes in the cluster that may have participated in the conversion events. To be sure, as in Chapter I, this analysis would again only provide circumstantial evidence consistent with conversion events and would not eliminate reciprocal recombination as a plausible alternative. Therefore, a second and more robust approach was undertaken in Part II of ths chapter to distinguish between the two possible hypotheses more convincingly. To achieve this, genomic DNAs were prepared from five other Drosophila sibling species (simulans, mauritiana, erecta, teissieri, yakuba , see fig. 28) and probed with pCS474, pDtl6R. Obviously, if the hybrid genes were derived from unequal (but reciprocal) exchanges between the bona fide tRNA4$e r and tRNA7$e r genes, then one would expect that at least some of these genes may have been gained or lost during the long evolutionary history of the tRNA4(7Ser g e n e cluster (between 13 and 37 million years). Conversely, if gene conversion were operative, then the gene families from the different sibling species would be unlikely to fluctuate in size or, at least, not from the actual gene conversion events themselves. Also, segments homologous to pCS474, pDtl6R, and pDt27R have been cloned from D. erecta and the latter two segments from D. yakuba to precisely define the nature of their genes at the DNA sequence level, Because these two sibling species have diverged from melanogaster for 146 approximately 13 and 37 million years respectively, they should provide more suitable candidates for seeking alternative forms of hybrid genes at the homologous sites. Furthermore, based on the extensive analyses of cohesive evolution of other multigene families, driven mainly by the mechanism of biased gene conversion coined as "sequence homogenization" (Strachan et al, 1982 and 1985; Dover, 1982), it has been well documented that the sequences flanking the gene members within a species consistently reveal a higher degree of sequence conservation than between different species. This predicted mode of multigene family evolution, which appears to be widespread, can be directly tested in this study for the tRNA4(7$er genes by inter- vs intra-species comparisons of their flanking sequences. The results obtained in Part I confirmed the presence of distinct homology blocks in both the 5- and 3-flanking sequences allied among the X-linked tRNA47^ e r genes in D. melanogaster, strongly suggesting that they are evolutionarily related. In fact, in all of the proposed recipient genes involved in the conversion events, the multiple homology blocks always maintained proper spatial alignment relative to those in the putative donor genes. This type of patchwork homology blocks are indeed consistent with the prediction based on intrachromosomal conversion events (see Discussion). In the cross species analyses addressing the problems of fluctuation in the size of the tRNA4jS e r gene families and flanking sequence intra-specificity (Part II), the results again provided convincing evidence consistent with the model of cohesive evolution via biased gene conversion. No gain or loss of the X-linked tRNA4(7$er genes (and the interspersed tRNA^rg genes) thus far have been detected in the DNA segments homologous with the three melanogaster plasmid probes. Sequence comparisons of homolgous genes from two other sibling species showed that there were more conserved sequences shared by members of a multigene family within a species, precisely conforming to the prediction of "sequence homogenization". Thus, all of the separate lines of evidence tend to forge a strong argument for gene conversion, rather than standard reciprocal recombination, as the viable model. Portions of the results presented here have been published in Cribbs et al (1987b). 147 RESULTS Part I-Homologies in the 5'-Flanking Regions of the melanogaster Hybrid Genes: Wedded  Patchworks 5' homology elements that are diagnostic of either tRNA4$e r or tRNA7$e r genes were first noted by Newton (1984). Depending on their location relative to the start (+1) of the structural genes, they were known as either -5 or -20 boxes. Briefly, both 444 genes within pDt27R contain the consensus sequences AAPyAA at about the -5 position and TTGGGPyT at about the -20 position. In contrast, the 777 genes have the consensus sequences ATPyAA at about -5 and CAAPyTT at about -20 (Newton, 1984; Cribbs et, al.l%7b). For the hybrid 774 and 474 genes and the singly altered 444* gene, no obviously paired homology boxes belonging to the exclusive domain of either the 444 or the 777 genes can be observed. Rather, their 5'-flanking regions are a wedded patchwork of sequences, embracing characteristics of both gene types. (A) The 474 Gene is Most Closely Related to the 444-1 Gene The 30 nucleotides in the immediate 5' flanking region of the 474 gene share strong homology to the corresponding region in the pDt27R 444-1 gene, but show much more divergence from the 444-2 gene. At the -5 position of the 474 gene, the sequence ATCAAG resembles the -5 homology element AACAAG in 444-1 gene with one base pair mismatch; but no shared -20 element is detectable (fig. 26, lines la vs 2). However, other significant homology blocks can be observed; at about position -15 in the 474 gene, the sequence TTGCGCA again shares strong homology with the corresponding sequence TTGCGIA (1/7 mismatch) in the 444-1 gene. The sequence TATJGATATT at about position -30 in the 474 gene is held in common with a similarly positioned TAGTGiTATT in the 444-1 gene (2/10 mismatches). The major discord from the -10 to the -30 region is confined to the replacement of the core tetranucleotide sequence GGGC of the -20 box by either four (Canton-S. line la) or two (Oregon-R, line 1) T nucleotides in the two 474 genes. These latter nucleotide changes create a sequence context resembling a 777-type -20 box but they may simply reflect fortuitous 148 1. Oregon-R 474 (pDt73) la. PCS474 2. PDt27R 444-1 -30 -20 -15 -5 t aaat t tTRTTGRTRTTtt—TTGCGCRtatRTCRRG t aaat ttTRTTGRTRTTt11 tTTGCGCRtatRTCRRG aTRGTG-TRTTgggcTTGCGTRggaRRCRRGta 3. pDt27R 444-2 ccggaagattgTTgggaTTtgatccaaflfilRR 4. p0t17R-777 5. p0t16R-774 5a. pCSI6-774 -33 -20 -13 -5 ct ctTGCRCCTctt gaact caat 111 cGCCRCccaccCRTCRR 11 aaTGTRGCTagagccgggtaat aagGCCRCt agagtCflTCRRa 11 aaTGTRGCTagagtcgggt aat aaaGX£J£tggagtCRTCRRa 6, pDt16R-777 6a.pCS16-777 7. pDt17R-774 -27 -20 -5 11 ctTT-CRTGccaacttgaccat ccgcgaTCRTTRfl ttctTTTCRTGccaacttgacgatccgcgaTCRTTRfl aaaaJJIfhlfigat ct ggat ggcaaat gt ctTCGCCRR 8. pDt17R-444* 9. pDt17R-777 10. pDt27R 444-1 -20 -15 -5 gtacgtRRTTT—GRCRCC-RgctRTRRCRRGRR gaact cRRTTTtcGCCRCCCRcccRTcaa at agt gt at t gggct t gcgt aggaRflCRRGJfl F i g . 26. 5-flanking homologies in t R N A 4 j S 8 r genes in D. melanogaster. Short 5-flanking regions of genes from chromosomal bands 12DE are aligned to highlight homology blocks (underlined). Genes derived from the Canton-S strain are denoted with "CS". The gene in line I is the same as pDt73. Note that all homology blocks are spatially aligned in linear order despite that all comparisons involved non-allelic genes. 149 mutational events. Beyond -30, little homology can be detected. (B) The 774 Gene in pDtlSR is Most Closely Related to pDt!7R-777 From nucleotides -2 to -7 in the pDtl6R-774 gene, the -5 sequence CATCAA matches exactly to that in the pDtl7R-777 gene; but again like the 474 and the 444-1 genes above, no corresponding -20 homology element is shared between them (fig. 26, lines 4 and 5). Two other tracts of nucleotide homology are evident beginning at -14 (GCCAC) and at -35 (TGTAGCT) in the 774 gene. Similar sequence tracts are also situated at almost identical positions in the 777 gene (-13 GCCAC and -34 TGCACCT, respectively). Note that the single A to T base difference in the -14 box is not between the non-allelic 774 and 777 genes (lines 4 vs 5) but is between the identical 774 genes from the two different fly strains (lines 5 vs 5a). (C) The 774 Gene in oDt!7R is Possibly Related to oDtl6R-777 The seven nucleotides, TCGCCAA immediately 5" to the 774 gene are only remotely similarly to the sequence TCATTAA (3/7 mistmatches) in the 777 gene and may not be significant. However, another short tract in the 774 gene, TTTC-TG, beginning at -28 shows strong homology to TTTCATG, which occupies an identical position in the 777 gene (1/7 mismatches) (fig. 26, lines 6a vs 7). Hence, there is weak evidence to suggest that the 774 and the 777 genes are related. However, the 774 gene in pDtl7R shows no homology to any of the other available 5' flanking sequences including those from three of the four autosomal tRNA7$er genes (Cribbs et at, 1987b; D. A. R. Sinclair, unpublished observations). (D) The 444* Gene Has a Patchwork 5'-Flanking Region Characteristic of pDtl7R-777 and 444-1  Genes The remaining 444* is not a hybrid gene based on the three diagnostic nucleotides, but does have a single mutation at the tip of the extra arm (C50-T5O) Nevertheless, it's 5-flanking region has characteristics of both 777 and 444 sequences. As shown in fig. 26, the sequence from -1 to -9 in 444*, AACAAGA.A, matches almost exactly to AACAAGIA in the same 150 position of the pDt27R 444-1 gene (lines 8 vs 10). On the other hand, strong homology exists between the sequence GACACC^A beginning at position -14 and the sequence GCCACCCA beginning at position -9 between the 444* and the pDtl7R-777 genes, respectively (2/8 mismatches, lines 8 vs 9). Another region of homology exists farther upstream between the same genes. Note that at about the -20 position in the 444* gene, the sequence AATTT resembles the -20 element in the 777 gene. Hence, it appears that at least one extragenic recombinatory event between the -5 and the -9 region may have engendered the hybrid homology pattern observed in the 444* gene. For convenience, the 444* will be included in the discussion of hybrid genes. Sequence Homologies in 3'-Flanking Region Homology elements except the most prominent poly-T termination signals in the 3-flanking sequences have not been previously recorded by either Cribbs et al, (1987b) or by Newton (1984). However, homologies are present in trailer sequences of some of the tRNA S e r genes, including all of the hybrid genes. As shown in fig. 27, the pDtl7R-774 gene contains the sequence TTT- A ATGCT AT A ATTT within the first 20 nucleotides immediately 3' to the gene that is homologous to a similarly positioned sequence TTTGTAAGCT-TAATTT of the Canton-S 474 gene (4/17 mismatches, lines 1 vs 2). Even if the poly-T termination signal was not considered to eliminate possible bias, there remains an overall homology of approximately 72%. In addition, both genes also share a long poly A sequence tract, albeit at somewhat different positions, that is absent in all other tRNASer 3- flanking sequences examined. However, since the poly A tracts are not positionally aligned and due to the high A-T content in the region, this apparent conservation could be fortuitous. The other 774 gene in pDtl6R also shares a truncated homology block, TTTATAAG, with the above genes starting from nucleotide +2 (fig. 27, lines 3 and 3a), With the exception of the poly T termination signal situated in almost the same position, there does not seem to be other significant homology. 151 +10 +20 +30 +15 1. pDt17R-774 TTT-RRTGCTRTRRTTTtagcttaatttagatacttacactgagRRRRRRRR 2. pCS471 TTTGTRRGCT-TRRTTTgtatttttacaaacRRRRRRRRRtQcta 3. pCS16R-774 gJJTRTflRGtgccaatttttttaaaataattaagccaaact 3a,pDt16R-774 qTTTRTRRGtgccactttttttttaataattaaaccaaQct +6 +20 4. pDt27R 444-1 RRTgaGRRTGTR-TRTTTTRtttcaaatgtttttattttctgaaat 5. pDt17R-444* RRTc-GRRTCGRRTTTTTTRcacttcgcatagagctaccatatttttta 6. pDt27R 411-2 gaagggtattcctatattttttatgttttaaaaggtgcattcttacagt 7. pDt17R-777 atatgaagagtatcttttttatgtcagatacttttatgtatctatgggat 8. pCSI6-777 aatagtaatctgttttttggaagtccagaaaatagatcgacagaaga 8a.pDt16R-777 aat agcaat ct gt11111ggaagt ccagaaaaaat agat cgat agaa Fig . 27. 3'-flacking homologies of non-allelic genes of D, melanogasterfrom chromosomal bands 12DE. As in previous figure, "CS° denotes genes that are derived from Canton-S, as in lines 3 and 8, otherwise, they are from Oregon-R clones. 152 There is conservation between the sequence, AATG^GAATGTA-TATTTTA, just outside the pDt27R 444-1 gene and the sequence AAT£GAATCGAAITITITA at the same position in the 444* gene (6/20 mismatches, lines 4 vs 5). Except for the regular intrusion of the poly-T termination signals, the 3'-tails from the rest of the bona fide genes pDt27R 444-2, pDtl7R-777 and pDtl6R-777 (lines 6, 7, 8 or 8a, respectively, in fig. 27) are composed of unique sequences, and therefore unrelated to any of the above genes. There is also an intriguing correlation between the tRNA47S e r diagnostic nucleotides at positions 16 and 77 within the coding regions of the hybrid genes and their immediate 5' and 3' flanking sequences. If positions 16 and 77 are both diagnostic of a tRNA4$e r gene (as in the 444* and 474 genes), then both flanking sequences would also be tRNA4Ser-like. Alternatively, if positions 16 and 77 are diagnostic of tRNA7$e r a n ( j tRNA4$er, respectively, then the 5'- and 3-flanking sequences would also be switched accordingly (as in both 774 genes). Thus, it appears that the identical hybrid nature of the two diagnostic nucleotides within the genes and their flanking sequences are inherited together. Part II- Examination of Loci Homologous to PCS474 and pDt!6R in Drosonn/taSibVme. Species: (A) Detection of Homologous DNA By Genomic Southern Hybridizations: D. metanogasteris a member of a small subgroup consisting of eight closely related sibling species that is virtually cosmopolitan in its geographic distribution. Morphologically they are very similar. The only source of reliable distinguishing features are the slight variations in their male genitalia. The evolutionary relationships among these sibling species, as shown in fig. 28, are primarily constructed by polytene chromosome banding patterns but have also gained support from species hybridization studies (David et. at., 1974), electrophoretic mobilities of certain enzymes (Gonzalez et. al, 1982; Ohnishi and Voelker, 1983; Eisses et. al, 1979) and from their mitochondrial, ribosomal and satellite DNA polymorphism (Barnes et al, 1978; Strachan et. al, 1982; Coen et. al, 1982). The arborescent topology depicting their relationships can be divided into two species complexes. D. melanogaster, D. simulans, D. 153 F i g . 2 8 . Evolutionary relationship among the eight species of Drosophila species subgroup based upon their polytene chromosome banding patterns [Bodmer and Ashburner, (1984).!. The numbers on the diagram indicate the minimum number of autosomal inversions that have accompanied each cladistic event. Thus, within the D. melanogaster complex only one large inversion distinguishes the chromosomes of melanogaster itself from those of the other three species. The chromosomes of the members of the yakuba complex differ from those of the melanogaster complex by at least eight autosomal inversions. From left to right, the symbols in the diagram are: yak, yakuba; tel. teissieri; ere. erecta: ore. orena; aim, simulans; oau, mauritiana; sec, sechetlia; met, melanogaster. 154 mauritiana and D. sechellia. have similar chromosomes. Only one large inversion on chromosome 3R distinguishes the chromosomes of D, melanogaster from those of the other species in this complex. The chromosomes of the remaining four species differ from those of the D. melanogaster complex by at least eight inversions. D. yakuba, the most dissimilar from D. melanogaster, differs by at least 30 inversions (Ashburner et al, 1984). Genomic DNAs prepared from D. melanogaster, D, simulans, D. mauritiana D. erecta, D. teissieri'andD. yakubawere digested with the restriction enzymes Hindlll. EcoRI and BamHl and the fragments were resolved by agarose gel-electrophoresis. After blotting by the modified method of Southern (1975; see Methods and Materials, P. 100), the filter bound DNA was probed sequentially with pCS474 and pDtl6R, which had been radiolabelled by nick-translation. Hybridization was initially carried out at low stringencies since there was no presumed knowledge of whether the homologous sequences existed in species other than D melanogaster To detect specific hybridization, the probes were subsequently removed by sequential washing under increasing stringencies. The filter bound DNA was first challenged with pCS474 at 42 °C in standard hybridization buffer as described (see Methods and Materials, P. 93). After the filter was washed at 42 °C in standard washing buffer (lx SSC/ 0.5% SDS), only featureless smears appeared in all channels (data not shown). At slightly increased stringency (58 °C wash), specific bands were discernible above the still considerably high background. When the washing temperature was elevated further to 65 °C, bands from specific hybridization were quite evident in almost all species represented in the EcoRI and BamHl digests (fig. 29). Hybridization is weak in all Hindlll digests, presumably due to some loss of DNA during the ethanol precipitation steps just prior to loading the gel, although specific bands could still be detected. However, for D. yakuba, the sibling species that is the most distantly related to D. melanogaster, no specific hybridization could be observed even at this stringency except for a persistent smear ranging from about 7.0 kb to 23 kb. It is known that pCS474 contains several different types of repetitious elements; thus, it is plausible that the high background in general and in D. yakuba in particular is due to strong cross-homology with these repeated elements. F i g . 2 9 . Genomic Southern blot of Drosophila sibling species subgroups with probe pCS474, Lane I: melanogaster; lane 2: mawitiana; lane 3: teissieri; lane 4: simutaos; lane 5: ; lane f>: yakuba The DNAs from the various sibling species were digested to completion with EcoRI. Hindlll. and BamHI and resolved in a 0.6% agarose gel. After Southern transfer (see Methods and Materials), the filter bound DNAs were probed at 42 °C and then subsequently washed at 42 °C, 58 °C. and 65°C. The hybridization bands were visualized after each wash by autoradiography at -70 °C with enhancer screen for approximately 3-4 days. Only the exposure after washing at 65 °C is shown here. Hindlll cleaved \ DNA and Hinfl cleaved pBR322 were used as size markers. The heavy smears along the edges of the figure are the result of over-exposure of the size standards generated by Hinfl cleaved pBR322. 156 EcoRI I Hindlll I BamHl 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 kb 157 When pDtl6Rwas used as the probe, similar observations as above were obtained (fig. 30). Specific hybridization bands were discernible after washing at 58 °C in standard washing buffer as above (not shown). After further washes under more stringent conditions (65 °C in lx SSC/0.5% SDS and then 68 °C in 0.2x SSC/0.1% SDS), essentially all background was removed and the specific hybridization bands remained intense in all species, including D. yakuba. Two Drosophila sibling species, D. erecta and D yakuba, were selected for further molecular cloning studies with respect to the pDtl6R and pCS474 homologous loci. The time of separation of the two sibling species from D melanogaster is difficult to estimate. Contingent on the choice of mathematical treatments, it appears that D yakuba diverged from D melanogaster between 13 and 37 million years ago (Ashburner et. al, 1984). Other extrapolations (Bodmer and Ashburner, 1984) suggest that D. erecta is more closely related to D melanogaster and probably diverged from the latter between 2 and 13 million years ago. When all three sibling species are considered together, in theory, they should present an ideal incremental time span of DNA sequence evolution represented by the two genetic sites. (B) Isolation of pCS474 Homologous Fragment from D erecta A library containing D. erectaVNh partially sheared with BamHI and then cloned into the vector XEMBL3 was screened with the nick-translated pCS474. Of a total 100,000 plaque screened, only one strong positive was identified. The phage, designated as XDE73. contains a 17 kb insert composed of two different 8.5 kb BamHI fragments. One of the fragments showing positive hybridization to both pCS474 and tRNASer gene-specific probes was subcloned into the BamHI site of pEMBL18* and mapped by standard restriction digests (data not presented). Southern blotting on the subclone showed that a tRNA$ e r gene was located on a 2.0 kb fragment bounded by a Pstl and a BamHI site (fig. 3D. This fragment was cloned into pEMBL8- and used as the template for dideoxy-sequencing by both the single- and double-stranded methods. The sequence indicates that it is a 474 gene (denoted as pDe474 in fig. 32), identical to its counterpart in pCS474. Unfortunately, the homologous DNA segment could not be isolated from X libraries of D. 158 F i g . 3 0 . Genomic Southern blot of Drosophila s ibl ing species subgroup with probe pDtl6R. Lane I: melanogaster; lane 2: mauritiana; lane 3: teissieri; lane 4: simulans; lane 5: erecta; lane 6: yakuba The litter from fig. 29 was treated in 0.4M KOH at 45 °C to remove the old probe and then neutralized at the same temperature in O.lx SSC, 0.1% (w/v) SDS, 0.2M Tris-HCl (pH 7.5) for 30 minutes. The conditions for Southern hybridization with pDtl6R are described in fig. 29, except the last wash was carried out at 68 °C before autoradiography as shown here. 159 160 yakuba using either pCS474 or the D. erecta 1.5 kb MI insert as a probe due to cross-reactivity with a large number of other phages. This observation is consistent with the previous genomic blot, in which potentially authentic hybridization bands in D. yakuba (a.g. BamHl digest) could have been obscured by repetitive sequences. In the same genomic Southern blot above, a 15 kb BamHl band could be clearly detected in D. teissieri Since this species is closest to D. yakuba cladistically (fig. 28) and appears to have less cross-homology with repetitive elements, I thought it might serve as a possible alternative. This fragment, though, also turns out to be elusive, even though the library was constructed from gel-purified BamHl cleaved genomic DNA ranging from 9.5 kb to 23 kb in size, and a total of 250,000 recombinant plaques screened. I have not made further attempts to clone these fragments using £ coli host strains other than Q359 (reck*). (C) Isolation of pDt!6R Homologous DNA Segments from D. erecta Of approximately 100,000 plaques screened with the nick-translated plasmid, also only one positive (but three identical copies) was isolated. The restriction map of the phage, designated as JLDE16, is shown in fig. 33 As predicted, the 8.0 kb and one of the two 1.6 kb BamHl fragments observed earlier in the genomic Southern blot showed homology to pDtl6R (labelled as the "z" region in fig. 33). In addition, probing with GT7 confirmed the presence of tRNA$er genes within a 35 kb region bounded by the Hindlll and BamHl sites. In the same blot, the gene-specific probe also revealed additional tRNA$ e r genes in the left most 1.6 kb BamHl fragment that is not homologous to pDtl6R. When the phage was rehybridized with plasmids used earlier as entry probes in the molecular walk, it clearly showed extensive homology with pDt27R (labelled as the "y" region in fig. 33). Comparison of the two Southern blots probed with pDtl6R and pDt27R indicates that the junction between the two regions is situated within the 2.6 kb Pstl+HindUI fragment (dotted part of region "z" in fig. 33) Since this restriction fragment shows strong hybridization with pDt27R but considerably weaker signals with pDtl6R, the point of the junction is very likely to be closer to the Hindlll site. Thus, in contrast to D. melanogaster, the spacer DNA separating the two regions is much 161 B L S l P i P s J L B J 474 1.0 kb F i g . 31. Subclone of the 8.5 kb BamHI fragment from JLDE73 from D. erecta. The restriction sites B-BamHI; S-Sstl. P-Pstl. There are no EcoRI or Hindlll sites in this subclone. The single 474 gene, denoted by the arrow head, is located between the Sill and EslI sites but the exact location is not known. The direction of transcription was deduced by differential hybridization of gene-specific oligonucleotides to single-stranded templates of the BamHI-PstI fragment. +50 ccagttatag gtatttattt atttaaggct gcttttaagt tatattcatt +100 agttatcagc aacgaaattc caatttatat tatcagctgt ggctttgcat +150 aagctalcgg aagtgttttt gttttataaa laaatactat atcaacatta +200 tattatttta gggcactctt aacgattcat accaaaGCflG TCGTGGCCGfl +250 GTGGTTRRG6 CGTCTGflCTfi GRRflTCRGRT TCCCTCTGGG RGCGTRGGTT +300 CGflflTCCTflC CGGCTGCGtt tgcgagctag atttttgtcc aaaaaaataa +350 ttaaagtagg aaatacatgt gccataagtg cattccagtg ttggctatcg +400 cacgaaaaga agtgcactta aataccatat actgccgagt tattttaatc +450 aagacatcga aatgcctaat ataaaaaagg atttttatat taagcataac atttgcaaag tgatgttcat tatttgtacc cttgcgtata attgt Fig . 32. Nucleotide sequence of the 474 gene from D. erecta. The capital letters indicate the structure gene beginning at position 1S7 and the three diagnostic nucleotides are underlined. 163 444-1 444-2 774 777 3'Arg-5 ^ B L ^Arg-i B B P 1 ' 1 H J . H J . B B D E 1 6 1.0 kb F i g . 33- Res t r i c t i on map of XDE16. Restiction sites for BamHI (B), Hind l l l (H) and Pstl (P) are shown. The lines "y" and "z" underscore the regions of the JL clone that are homologous to pDt27R and pDtl6R, respectively. The dotted part of line "z" symbolizes weak hybridization with the pDtl6R probe, suggesting that the homology extends only part way into the 1.5 kb Hindl l l -Ps t l fragment. However, this same fragment shows very strong hybridization to pDt27R. The t R N A ^ e r genes found in this clone are shown above as small arrow heads pointing in their directions of transcription. The 444-1 and 444-2 genes are homologous to the R melanogaster pDt27R 444-1 and pDt27R 444-2 genes, respectively. The Arg-1 gene, depicted as a wide stemmed arrow, in this clone is very l ikely to be the antecedent that evolved into the four duplicated counterparts in melanogaster (Argl2.1 to Argl2 .4 in fig. 21). Abutting the X vector arm on the left is the 3-half of another t R N A A r S gene (Arg-6), which is homologous to Arg l2 .6 in melanogaster. However, this erecta gene is transcribed in the opposite orientation, and it is much closer to the t R N A S e r genes, relative to its melanogaster counterpart. The 774 and 777 genes in this clone are homologous to the corresponding genes in pDtl6R from D. melanogaster. The exact locations of the two genes are not known since the resolution of the restriction fragments in the gel was poor during mapping, but both are very near the Hind l l l site. 164 +50 taagaatgtg acattagtag ttatgtgatc ggtttttttt ttttctataa +100 aaatatcggt gatatgggcc ttaaagtcgg gtgattaggc cacaagtgtc + 150 atccaaGCflG TCGTGGCCGR GCGGTTRRGG CGTCTGRCTR GRRRTCRGRT +200 TCCCTCTGGG RGCGTRGGTT CGRRTCCTRC CGGCTGCGgt tgtaaatact +250 attttacttc gaacaagtaa accaaacatt tgaagcaaaa aggttacagt +300 atagagaata actaattatg caacaattgt taaaaaacct aactctggaa Fig . 34. Nucleotide sequence of the 774 gene from D, erecta. It was cloned as a Hindlll-Ddei fragment and the 5-end of the Drosophila insert is immediately adjacent to the polylinker cloning site. The structural gene is depicted in capital letters and the three diagnostic nucleotides are underlined. +50 ctttttatgc ctatgccttg aaatagagcc cccaatcccc aaaaactatt + 100 caatcgtgtt ttcagccaac ttggcgatcg gtgatcatta aGCflGTCGTG + 150 GCCGRGCGGT TRR6GCGTCT GRCTRGRRRT CRGRTTCCCT CTGG6RGC6T +200 RGGTTCGRRT CCTRCCGRCT GCGatagaaa cttgtttttt tttggaattt +250 ccgaaaataa tgcaagatcg gaaagtataa tatttagaag atatcttatg +300 ctgtttaaat atatgtcatg gtgaaaacat aaagtgtata gtaagtgaaa ttatgcatta aaaaatatat aact Fig.35- Nucleotide sequence of the 777 gene from D. erecta.. The fragment is cloned as a Ddel fragment. The structural gene is depicted in capital letters and the three diagnostic nucelotides are underlined. 166 reduced by deletion of at least 18 kb in D. erecta. The 35 kb Hindlll*BamHI fragment with homology to pDtl6R was subcloned into pEMBL18+. Restriction analysis and Southern blotting of the subclone with respect to the enzymes Ddel, Rsal, and HaelH indicate that there are probably only two t RNA S e r genes (data not presented). The two smallest possible DNA fragments containing the separate genes identified by further blotting (data not shown) were HindW+Ddel and a Ddel fragment of 400 and 550 bp, respectively. The ends of the two fragments were filled with all four dNTPs and were subsequently cloned into the BamHI site of pUCl3 which had also been similarly repaired prior to ligation. As shown in fig. 34, the 400 bp Ddel+Hindlll fragment contains a 774 gene (pDe774), while the 550 bp Ddel fragment contains a 777 gene (pDe777) (fig. 35) From the sequence data and the sizes of the templates, the two genes are situated at least 200 base pairs apart, but have not been mapped precisely. (D) Isolation and Sequencing of the D. erecta tRNA^ e r Genes Homologous to pDt27R As mentioned in the above section, 3LDE73 also contains other tRNA^ e r genes on the 1.6 kb BamHI fragment abutting the left arm of the phage (arbitrary orientation in fig. 33) This fragment and adjoining sequences to its right totalling 6.3 kb share strong homology to pDt27R. Digestion of this 1.6 kb BamHI fragment with the restriction enzymes Hpal I. Sau3a. Alul, Hindi and then hybridization with GT7 indicated that the insert contains at least two tRNA$er g e n e s . These genes were cloned as 350 bp and 700 bp Alul fragments by blunt-end ligation into the filled EcoRI site of pUCl3. Sequence analysis showed that both the small and the large Alul fragments contain tRNASer genes identical to 444-1 (fig. 36) and 444-2 (fig. 37) in pDt27R, respectively. From Southern blotting and mapping experiments (summarized in fig, 38), the arrangement of the two 444 genes from each other and from the flanking BamHI cloning sites (corresponding to two tRNA^g genes, see Chapter III) are virtually identical to that in D. melanogaster. As will be discussed later, the conserved spatial arrangement of the different tRNA genes would argue against simple reciprocal exchanges between tRNASer a n d tRNA7$er genes as the likely mechanism for creating the hybrid 167 ctaagttcgc tgagaaatta gaatcttgtc tagggtattg ggcacgcaga +100 acaacatgta GCflGTCGTGG CCGflGTGGTT flflGGCGTCTG RCTCGflflflTC +150 RGRTTCCCTC TGGGRGCGTfl GGTTCGARTC CTRCCGGCTG CGgagtaaat +200 ctttatttta tttagaagta tttttttttt attttttttt taaatttatt +250 tttgatgttt ttattttagc cagaaattaa actaatatat gttattgaaa +300 tagaattttc aacataacag cacatgtgaa agttaggtgt tttaatgcat aUaattaat cgtgttacag aattatcgtt ctttaaagat c F i g . 36. Sequence of D. erecta 444-1 gene. The 5-end of the insert is adjacent to the polylinker cloning site. The sequence of the gene is depicted in capital letters. The three diagnostic nucleotides are underlined. The flanking sequences showed that it is homologous to the pDt27R 444-1 gene of D. melanogaster. 168 ctagtatgtt aacctttgga accgaaattc gcataaaatc ccgaagattt +100 ttggtattcg atcggtatga aGCAGTCGTG GCCGAGTGGT TAflGGCGTCT + 150 GACTCGAAAT CAGATTCCCT CTGGGflGCGT AGGTTCGAAT CCTACCGGCT +200 GCGgatggaa catttatttt acataattcc taggagaggg ttacattttt +250 gtgttccttt tgcttgacaa attcttcctg tctgctgaat cttttatcat +300 ataacattat ataaaatttc tcattctaat cttattcaag caaccacatc +350 tcaaattttt Uacgttacc tatttgtctg gcgttgcgtg gacttacaca Fig . 37. tRNA4 S e r gene of D. erecta homologous to pDt27R 444-2 gene of D. melanogaster. structural gene is depicted in capital letters and the three diagnostic nucleotides are underlined. 169 F i g . 3 8 . Localization of the 444-1 and 444-2 genes i n the 1.6 kb B&mHI fragment of JLDE16 by oligonucleotide indirect labell ing method. The BamHl fragment was released from the cloning vector by cleavage with BamHl (B). The enzyme Hoall (± ). which cleaves at nucleotide 77 in the 3' complementary strand of the aminoacyl acceptor stem of the structural gene, was used to locate the gene within the insert. Also, since the two genes were previously subcloned as Alu l fragments, this enzyme was use to distinguish the order of the two genes within the BamHl fragment (Alul- • ) . Aliquots of the partial digests with Hoall and Alul were removed at time intervals at indicated (from 3' to 30). The products were resolved by electrophoresis in a 2X agarose gel and then transfer onto a sheet of Hybond nylon filter as described. Fragments specific to the Drosophila DNA were detected by hybridization to the oligonucleotide Arg3' (dotted line) and detected by autoradiography. The two genes are shown as arrow heads pointing in the same direction of transcription. The numbers I and 2. indicate the equivalent 444-1 and 444-2 genes in erects, respectively. Note that the spatial arrangements between the two tRNA 5 6 1" genes and between the 444-2 and Arg-1 at the right BamHl site in JlDEl 6 are similar to those in pDt27R in D. melanogaster. 170 B • _ L B I 200 bp 171 genes. (E) Isolation of oDU6R Homologous DNA Segment From R vakuba Of approximate 100,000 plaques screened, three different phages (&DY16-1, 3UDY16-3 and JM16-82) vere obtained (fig. 39). Only 3LDY16-82 has been used in the subsequent detailed gene analysis. Similar to JLDE16, this phage also contained homologies to both pDtl6R (region "x") and pDt27R (region V ) ; although in this case, their junctions are separated by at least a 2.2 kb Sstl+HindlH fragment. The pDtl6R homologous segment is localized within the 58 kb Hindlll fragment (fig. 39), and contains two tRNA^ e r genes. One gene has been cloned as a 300 bp Sau3a and subsequently as a larger 2.5 kb Sstl+BamHI fragment, identified earlier by standard restriction analysis and Southern blotting. The OIL mapping strategy on the latter fragment showed that the gene is very close to the SstI site at one end (fig. 40). Combined sequencing experiments performed on both cloned Sau3a and the Sstl+BamHI fragments indicate that it is a 777 gene (pDe777) (fig. 41). From the sequence data, the direction of transcription is towards the EcoRI and Ss±I sites in the X restriction map (fig. 39). The other gene was mapped to, and cloned as, an 800 bp Sau3a and a 2.5 kb Sstl+Hjndlll fragments (fig. 42). Combined sequencing experiments utilizing the two cloned restriction fragments showed that it is a 774 gene (pDe774) (fig. 43). Mapping experiments showed that it is also very close to the SstI site, but the exact distance from the 777 gene has not been determined. The direction of transcription is deduced by sequencing through the Sau3a sites in the 2.5 kb Sstl+HindlH fragment (fig. 42). (F) Analysis of oDt27R Homologous Region in JIDY16-82 Blotting experiments using radiolabelled pDt27R as the probe detected strong homology in the left most 50 kb region bounded by a SstI site in 3116-82 (region "w" in fig. 39). Re-hybridization of the filter-bound 31 clone cleaved with Rsal. Ddel and HaelH with tRNA$ e r gene-specific probes (mp9Ser7 and oligonucleotides) showed that a single gene is confined within the left most 0.8 kb BamHI fragment adjoining the X vector arm, Sequencing 172 444-2 | A r g - i B B B E B II II I S H 774 777 S E - L L B H B I I I B J w DY16-1 DY16-3 DY 16-82 1.0 kb I F i g . 39. Restriction map of pDtl6R/pDt27R homologous region from D. yakuba. The thick line shows the chromosomal region with the restriction sites marked above. B-BamHl; S-Sstl: E-EcoRl: H-Hindlll. Below are the regions "w" and "x", which underscore their homology with pDt27R and pDtl6R, respectively. The dotted part of region "x" indicates weak hybridization to pDtl6R in the 0.5 kb H-B restriction fragment. Unlike HDE16 in fig. 38, the two regions here clearly do not overlap, but are separated by at least 2.2 kb of intervening DNA. The three individual phages collected from this chromosomal site, and their extent of overlap, are shown below the map. Only 3XDY16-82 has been used in detail analysis in gene localization and DNA sequencing. The t R N g e n e s are shown as small arrow heads, and the tRNA^ f 8 -1 gene is shown as a wide stemmed arrow above the restriction map, pointing in their respective directions of transcription. 173 Fig . 40. Localization of the 777 gene in JlDY 16-82 by the oligonucleotide indirect labelling method. The 2.5 kb Sstl-BamHl subclone was released from plasmid vector by cutting with Pvull at about 70 bp 5' to the universal priming site and at the MI site within the polylinker cloning site. The mixture of fragments were treated with "mapping" enzymes Hinfl, IaoJ, and Sao3a. Aliquots were removed at time intervals designated above the autoradiograph (top). The site of the tRNA5*1" gene, as indicated by the small arrow head to the left of the autoradiograph was localized by the overlapping Hinfl and Tag I sites and it is also shown below in the restriction map. The direction of transcription was deduced by sequencing through the Sj£| cloning site 3' to the gene. B-BjfflHI; S-SslI; ( • )-Hjn/I; ( ° KTaflJ; ( • )-S&u3a. The dotted line below the restriction map shows the Fl priming site. B • • ,s • • • • • 1 1 1 1 , 1 200 bp 175 +50 ctataaaatg ccgcattcaa tcagcaatcc tcatcaaaat aaaacaaacg +100 tcaactactt ttaacttcat taccattatc atcgaccaca cattgcttac +150 tcagctttta tgcctatacc ttgaaatagt ggccccaacc ccaacccccc +200 aaaaaacgat ccaatcttgt tttcacgcta acttggcgat catgatcact +250 aaGCRGTCGT GGCCGRGCGG TTRRGGCGTC TGRCTRGRflfi TCRGRTTCCC +300 TCTGGGRGCG TRGGTTCGRR TCCTRCCGflC TGCGatgcat atgagttttt +350 ttttggaatt ccaaaaatat agcaagatta gaaattatta gaagctagag ctct F i g . 4 i . Sequence of the 777 gene i n D. yakuba. The structural gene is depicted in capital letters and the three diagnostic nucleotides are underlined. Two restriction sites are used to orientate the direction of transcription in the molecular walk: EcoRI site (GAATTC) at nucleotide 306 and the SstI site (GAGCTC) at nucleotide 348. Both of these sites are highlighted by dotted underlining. 176 F i g . 42. Localization of the 774 gene i n ADY16-82 by oligonucleotide indirect labell ing method. The 2.5 kb Sstl-Hindlll subclone was released from the plasmid vector by cutting with EyuII 5' to the priming site (dotted line below the restriction map) and at the M I site. The mixture of fragments was partially digested with "mapping" enzymes H M I ( • ) . Tag! ( • ), Hoall ( • ), and Sau3a ( • ). Fragments were resolved in a 1.5% agarose gel and transferred to a sheet of Hybond as described. The DNA partial products specific to the Drosophila insert were detected by hybridization to F l . The arrow head to the right edge of the autoradiograph points to the approximate location of the tRNA^61* gene. The resolution in this gel region is too poor to tell the precise location and transcription direction of the gene. However, from ONA sequencing, the 5' end of the gene is close to a Sau3a site but more than 200 bp from the §gll cloning site. The restriction map is shown below, where the small arrow head points to the direction of transcription (H-HjndHI; S-&JJ). 177 200bp 178 +50 gatcagtaat attggcccta aaagtcgtta aagtccggta attaagcctc + 100 tggggtcatc caaGCflGTCG TGGCCGAGCG GTTflflGGCGT CTGflCTflGflfl +150 RTCflGflTTCC CTCTGGGRGC GTRGGTTCGfl RTCCTRCCGG CTGCGgttaa +200 tgaatattat tttattttaa ataattaaat caataatagg cattaccctt ttactctatg aaattataca F i g . 43. Nucleotide sequence of the 774 gene from D. yakuba. The fragment was cloned as an 800 bp Sau3Al fragments and sequenced using gene specific oligonucleotides GT$ and GT7 . Confirmatory sequences were obtained by using the universal sequencing primer R i . The structural gene is depicted in capital letters and the three diagnostic nucleotides are underlined. 179 +50 ggatccgatc ggcaagaaGC AGTCGTGGCC GAGIGGTTAR GGCGTCTGAC +100 TfiGAAATCRG RTTCCCTCTG GGAGCGTAGG TTCGARTCCT ACCGfiCTGCG +150 aagtgtataa aactatttta tttattttaa tacaaacaag gcgctttaaa +200 attctttaaa tatctttatt atgttctaag cacaaggtgt aaaaattgtg +250 t t t tcct t t t gcttgacgaa ttcttcccgt ctaatgaatc ttttaccatt taaaattcta taaaattccg cttttaatc F i g . 44. Sequence of the 444-2 gene i n /> yakuba that is homologous to the ft melanogaster pDt27R 444-2 gene. It was cloned as an 0.8 kb BamHI fragment and sequenced using primers GT&, GT7 and F l . The structural sequence of the gene is depicted in capital letters and the three diagnostic nucleotides are underlined. The putative termination signal is 16 bp downstream from the gene (at nucleotide 116). The direction of transcription relative to other genes in the molecular walk is deduced from sequencing through the EcoRI site (GAATTC) at nucleotide 218, which is highlighted by a dotted underline. 180 experiment shoved that it is an expected t R N A ^ e r gene (designated as pDY444-2), by virtue of its limited but notable sequence homologies to previously cloned 444-2 genes from the other Drosophila species (fig. 44). From the sequence data, it is clear that the direction of transcription is towards the EcoRI site ("118 bp downstream). The other expected tightly linked 444-1 homologous gene was not detectable in the X clone, presumably it was removed by the fortuitous presence of a BamHl site during construction of the library. Note here, again, that the spatial arrangement of the 444-2 and the BamHl site 600 bp downstream (corresponding to a tRNA^rg gene as discussed below) is also well conserved when compared to both erecta and melanogaster Part III- Rates of Flanking Sequence Divergence in Homologous tRNASer Genes From  Different Drosophila Species Homologous bona fide genes from the different Drosophila species were compared to determine empirically the rates of divergence in the flanking sequences. For the hybrid genes, they usually show greater rates of divergence and the data obtained for these have not been condensed into a figure. They have only been dealt with as the occasion required. In general, 5'-flanking sequences of bona-fide homologous genes (444-1, 444-2 and pDtl6R-777) from the different species display divergence of approximately 20-30% from each other (fig. 45). In all bona fide genes, convincing coupled -5 and -20 boxes that are gene type specific are observed. The only exception is pDe444-2, where the -5 box is missing (line 3 in fig. 45). Only the immediate flanking regions of about 70 nucleotides are shown, but the general trend of sequence conservation is clearly pervasive in all cases where more extensive data are available. Most of the differences between species are due to point mutations that are scattered sporadically and evenly throughout the lengths of the sequences. A slight deviation to this general pattern is observed between the -5 and the -20 boxes in the 444-1 genes from melanogaster anti erecta (line 1 vs 2) where the nucleotide changes are clustered. These clustered nucleotide changes are not random, as seen in a later study, but show homology with a DNA segment in the 5-flanking sequence of the D. erecta 474 genes. 181 1. pDe444-1 -20 -10 qQQflaqTTflQflfltCTTqtcTflGqGTRTTGGGCQcGCogflQcflfiCFltGTfl 2. pDt2?444-1 RRaCTTooqTRGtGTRTTGGGCttGCQtflQqRRCRQGTR 3. PDe444-2 TTflflCCTTTGGaaCcGaaflTtcGCRTflflflaTCCcGflflGRTTtnGiiflllcGRTCGGtatgRfl 4. pDt27R444-2 TTRRCCTTTGGccCtGttRTatGCRTRRRcTCCgGRRGRTTgTTGGGflTTtGRTCcoqRRtRR 5. pDy444-2 GGRTccGRTCGGcfiflgflfl 6. pCsl6-777 GCCCCgflTCCt CflflflRgcgfiTcCRRTCt TcTTTTCflt GCcflMnGaCGflTCcGcGflTCfllJflfl 7. PDe777 GCCCCcafiTCCCCflflflflflCtflTtCRRTCgTgTTTTCflGCcflMnGgCGflTCgGtGflTCflllflfl 8. pDy777 cccqqCCCCCCqRRRRflCqRTcCRRTCtTqTTTTCRcGCtRRCTTGgCGRTCa-tGRTCRcTRfl F i g . 45. Comparison of 5 - f lank ing sequences among homologous genes from different Drosophila species. The% of homologies are as follows: line 1 vs line 2-69%; all inter se comparisons among lines 3,4 and 5-73%; all inter se comparisons among lines 6, 7 and 8-80%. Most differences are due to single mismatches. Amost all genes contain convincing -5 and -20 boxes (underlined), except pDe444-2 (line 3), where the -5 box is missing. In the case for the two 444-1 genes from erecta md me/anogasteriiines 1 and 2), greater perturbations are observed in the small region between the -5 and -20 homology boxes. 182 In the 3-flanking regions of these same genes, sequence divergence is surprisingly rapid. For the 444-1 and 444-2 genes, homology hovers just above randomness, ranging from 35% to 45%, respectively. Most of the conserved nucleotides can only be superimposed around the poly-T putative termination signals, with the rest of the sequences being essentially unique. However, the three 777 genes do share higher degrees of homology ("60%), although remain depressed relative to the 5-flanks (fig. 46), Against this backdrop of variability, the two 474 genes of D. melanogaster and D. erecta appear to record a distinctly different pattern of sequence alterations (fig. 47, lines 2 vs 3). The overall homology in the 5-flanking region is only "55%, with infrequent blocks of conserved nucleotides that are A-T rich in content. Also in contrast to the bona fide tRNA genes above, the mutations are not homogeneously peppered throughout. In particular, the flanking region closest to the gene, between -1 and -30, shows the most divergence (homology at best is 35% even with loopouts). However, the replaced nucleotides are not random; instead, as alluded to above, they embrace striking homology patterns and sequence alliance with their respective intra-specific 444-1 genes- "intra-specific" is defined as within a species (Dover, 1982) (fig. 47, lines 1 vs 2 and lines 3 vs 4). For the 774 genes, the 5'-f tanking sequences display average homologies of "72%-78%. The values are probably slightly inflated, since insertion or deletion of entire sequence blocks have been scored as single mutational events. Nonetheless, homologies are ample throughout most of the lengths of the sequences displayed in fig. 47 (lines 5-7). The two conserved boxes at the -5 (CATCAA) and -15 (GCCAC) positions, described in an earlier analysis for the D. melanogaster pbtlTR-777 gene, are also present in the 774 genes from all three species. The presence of these two segmental homologies in all three sibling species would suggest that their origin may be ancient and that they may serve some biological function such as contact points for transcription factors. However, a similar functional significance is more difficult to assign to a third homology segment at the approximate position of -36. From comparison of the 774 genes between the different species, this small region also exist as clustered mutations. At least two separate mutational events are notable. Relative to D. ereeta($}$77A, 183 1. pDe444-1 gfigtflafltcttTflTTTTfiTTTogflfigTaTTTTTtTTTTattttttttttaaatttatttttgat 2. pQt27R444-i aRtgflgflatgtaTflTTTTRTTTcRRaTgTTTTaTTTTctgaaattaaataaaaacgttctgca 3. pDe444-2 GfltGGaacaTtTRTTTTflcataflTtccTaggagagGGTtaCRTTTTTGtgttccttttgcttga 4. pDt27R444-2 GflaGGgtatTcctaTRTTTTTTRTgTTTtaaaaGGTgCRTTcttacagtTTTGaatatgtttat 5. pDy444-2 aflgtgtataaaacTRTTTTattTRTTTTaatacaaacaaggcgctttaaaattctttaaatatc 6. pCs777 flflTflGtaatcTGTTTTTTGGRflGTCCaGflfiflRTagflTcgacagflflGRTCaGflflflaagTflTTaflg 7. pDe777 flTRGaaactTGTTTTTTTTTGGflflTTtCCGflflfiflTflflTGCflflGRTCgGRflfigtataaTflTTTRT 8. pDy777 RTgcatatgaGTTTTTTTTTGGRRTTCCaflRRRTfltaGCRflGflTtaGRflfltTRTTagaagctflg F i g . 46, Comparison of 3 '-flanking sequences among the homologous bona fide genes from the different Drosophila species. The % of homologies are as follows: line 1 vs 2-45%; inter se comparisons among lines 3, 4, and 5-35%-40%; inter se comparisons among lines 6.7 and 8-60%. Most of the conserved regions correspond to the putative poly-T termination signal. However, the three 777 genes (lines 6, 7 and 8) show higher levels of conservation (which is also observed in their 5'-ends). Thus, the 3 -ends of genes record 25-40% faster sequence divergence in general, relative to their 5 -ends. Note that the trailer sequences between the 3'-end of the structural genes and the poly-T signals are not well-conserved. This appears to be a prevalent trend in the evolution of tRNA genes in general. 184 line 5), the yakuba. copy carries an additional sequence inserted at -36, AAAAGTCGT, immediately upstream from the GGGCCT tract (pDy774, line 6). In D. melanogaster, there is no extra sequence, but the latter tract is replaced with the sequence TGTAGCTA, which resembles a similar sequence in the intra-specific pDtl7R-777 gene (line 8). Although this third segmental homology present in the 774 gene is less dramatic relative to those in the 474 gene, the substitution again does not appear to be capricious, but submit to the same pattern of intra-specific sequence alliance and spatial conservation that has been the general theme with all other hybrid genes studied thus far. No homology was detectable in the 3-flanking sequences of pDe474 and pDe774 genes near the poly-T termination signal as would be expected from the melanogaster studies. This is not surprising, based on the surveys in the sibling species that even the 3' tails of homologous tRNA genes usually show extremely poor sequence conservation. However, pDe474 could also share a common lineage with other tRNAS e r genes in the cluster that have not been cloned. For example, a likely candidate would be the intraspecific pDtl7R-774 equivalent gene, based on the earlier analysis in melanogaster. Since this gene as well as the remaining tRNA$ e r genes have not yet been cloned, a complete analysis as for the melanogaster genes is not possible. In summary, two distinct types of segmental homologies have been identified from sequence comparisons of all available 5'-flanking sequences of tRNA47$ e r genes. The first (type I) is the coupled homology elements that are always held at positions -5 and -20. They can in fact be divided further into two different subclasses since these elements are highly specific for either 444 or 777 genes. Because homology boxes are generally extremely rare in the 5'-flanking sequences of tRNA genes, the most striking aspect is the persistance of the -5 and -20 boxes in all of the bona fide tRNA$ e r genes obtained from different Drosophila sibling species spanning at least 13 million years of evolution. It is possible that such a high degree of conservation may reflect some functional status within the cell. For example, they may serve to modulate the rate of or to direct the correct initiation sites for transcription 185 -30 -20 -10 1. pDe444-1 gaat ct t gt ct agggt aTTGGGCflCgcagflRCRflcat gt a 2. pDe474 TttatqoaTqflRtqcTRTRTcflacaTtqTRttflTTTTRGGGCRCtcttRRCGRttCflTRccaRR 3. pCS474 ToatactcTtRRcttTRTRTtRgttTctTRqRTTTTRtTGaTRTTttttTTGCGCRTRtRTCRRG 4. pDt27 444-1 aTRgTG-TRTTgggcTTGCGtRggqRRCRRGtq 5. pDe774 -50 -40 -30 -20 -10 RTcaGTGRTRTqqqcctTRqRGTCGGGTqRTtRgGCCRCqqGtGTCRTCCRR 6. pDy774 cqGTqflTRTtggccct g^ g^ gfTRqRGTCCGGTqRTtRqGCCTCtgGgGTCRTCCflfl 7. pCS16-774 fiTqcqGTGRTflTtqqJiJJGXIflgflGTCGGGTaRTaRqGXCICtgGqGTCJKM 8. pDt17-777 ccqqcctctTGCRCCTcttgqqctcaqttttcGCCRCccqccCRTCRR F i g . 47 . Evidence for concerted evolution of t R N A 4 ( 7 S e r genes. Comparison of the 474 genes from erecta and melanogaster shoved rapid sequence divergence in the immediate 30 nucleotides 5' to the genes (lines 2 vs 3). The only obvious conservation is the sequence, CATA, at position -6 in pDe474 and -8 in pCS474. The sequence divergence is far from random, but the nucleotide replacements show strong homology with their intra-specific 444-1 genes. The 774 genes from three different species (lines 5-7) also showed coincidental clustered mutations at approximately -40. In melanogaster, the replacement sequence in pCS 16-774 is almost identical to a spatially conserved tract in pDtl7R-777 (lines 7 vs 8). Note that the co-evolving DNA segments between non-allelic genes within the samespecies (intra-specific) are reflected as clustered mutations between the homologous genes from different species (inter-specific). 186 dictated by the particular gene types above. The second type (type II) of segmental homologies is much more heterogeneous in sequences, in sizes, and in their spatial arrangements. One generalization than can be applied would be that they are only found in gene pairs; one of the genes is always a hybrid gene and the other is always a bona fide gene. The other generalization is that, with rare exceptions, both 5'- and 3 - segmental homologies are not well-conserved across species. Since they do not usually cross species boundaries, this would strongly suggest that these small segmental homologies arose only after the sibling species have diverged from 2 to 37 million years ago. The origins of these small homology patches would make sense if the focus of attention is now shifted to the comparison of the 5'-flanking sequences of those homologous genes derived from across species. For example, when the homologous 444-1 genes (fig. 45, lines 1 and 2), 474 genes (fig. 47, lines 2 and 3) and the 774 genes (fig. 47, lines 5,6, and 7) are aligned, a background of mutations randomly distributed throughout the lengths of the DNA is readily apparent. Above this background "noise", there are also major interruptions in homology characterized by clustered base changes. In every case, these major interruptions can be aligned in sequence and in a spatially conserved manner with another tRNA$ e r gene, which is invaribly intra-specific. Thus, like the three point changes internal to the genes, the clustered changes in the flanking region are not novel mutations but are merely "renovations" using materials from a known donor genes within the 12DE gene cluster. 3-Flanking homologies have also been identified in the tRNAS e r genes in melanogaster where more complete data are available. One may argue that since this region is A-T rich, the homologies may only be apparent owing to this biased nucleotide content. This explanation seems unlikely. Again, when homologous genes are compared from across species, the nucleotide sequences in these trailer regions are only slightly above random. Sequence randomness in the trailers tends to be a general feature also found in other families of tRNA genes even when they are closely linked; for example, tRNA^rg at 12DE reported in Chapter III and those at 42A reported by Yen and Davidson (1980). Hence, based on considerations from other tRNA gene families and from the homologous tRNA$ e r genes from across species, 187 those segmental homologies for the non-allelic tRNA S e r at 12DE would be highly significant (1.8- to 2.0-fold higher in sequence conservation), despite their biased base content. The most intriguing correlation from the above sequence comparisons is that both the segmental homologies in the flanking sequences and internal base changes at positions 16 and 77 within the structural genes are always inherited together as either tRNASer- or tRNA 7$er-type. Thus, these telltale signs obtained from the independent analyses above do converge and assemble into coherence entirely conforming to the central prediction of Molecular Drive. Furthermore, the above observations do not support a model invoking reciprocal exchange. Note that the local organization of both tRNASer and tRNA^rg genes in HDE16 (from D. erecta ) and M6-82 (from D, yakuba ) is extremely similar to that in D. melanogaster, with no gross rearrangements detectable in the two corresponding chromosomal regions. Although the spacer DNA between the local tRNA gene clusters is variable, their rearrangement does not extend into or affect the relative organization of the genes. Furthermore, in both erecta and melanogaster, it has been clearly demonstrated at the DNA sequence level that the 444-1 and the 474 genes are co-evolving. If these hybrid genes were generated by DNA slippage followed by recombination, it would be expected that the tRNA A rS genes on either side would be deleted during straddling and recombination between the tRNA$ e r genes. This definitely has not been observed. Note that while the tRNA A f 8 gene cluster encoded in the pDt27R region does fluctuate in size between different melanogaster strains and species, as will be discussed shortly, the fluctuation would be better explained by standard recombination process within the small direct repeats flanking these genes, rather than as the direct results of conversion events between the adjacent tRNA$ e r genes. The structures of the tRNAS e r genes themselves have not changed across species and the reason for this is not clear. However, the fact remains the the 474 and the 444-1 genes in the two species examined, melanogaster and erecta, always evolve cohesively despite their divergence from each other between 2 and 13 million years ago suggests that perhaps certain types of gene interactions are favored, constrained either by selection or by the type of 188 surrounding sequences (such as stabilization by repetitive elements, hotspots for conversion or pairing constraints of the chromosomes), that may exert a strong influence on the frequencies and types of hybrid genes observed. CHAPTER III 189 tRNAArg Genes at 12DE As suggested by the observations in the previous chapters, it is likely that the hybrid tRNASer genes were formed by conversion events between the bona fide 444 and 777 genes. All of the existing models that have been formulated to explain gene conversion at the molecular level invoke heteroduplex formation as the key intermediate step in the process (reviewed by Szostak et. aJ„ 1983). Thus it is conceivable that occasionally the DNA strands at 12DE may slip and slide, causing adjacent 444 and 777 genes to mispair, forming the heteroduplex as the necessary prelude to gene conversion. Because this region is also rich in repetitive sequences, it is also possible that mispairing at some of these repeats may assist in stabilizing the tRNA^r/ tRNAy^r heteroduplex, or may even be agents critical for initiating the slip-slide events themselves. Since half of the tRNA$ e r genes at 12DE have DNA patchwork reminiscent of gene conversion (see Discussion), and also that the repetitive elements in this same region are highly polymorphic between the different Drosophila species, these observations would suggest that DNA slippage and recombination may be common occurrences. Since the tRNAArg genes are interspersed among the tRNA$ e r genes, it is reasonable to assume that both gene families should be equally susceptible to this local distortion of the DNA. Thus, in this chapter, I have exploited the tRNAArg genes encoded within pDt27R as independent monitors for DNA slippage in this region. The inpetus to this work was provided by the available sequence data of pDt27R (Newton, 1984). Downstream from the tRNA-^r genes, there are four clustered BamHI sites, which are actually part of the coding sequences (beginning at position 36) for four duplicated tRNAArg genes. Each of the duplicated units is demarcated by two flanking direct repeats, TAGCCCAA (fig. 55) Two of the duplicated cassettes are 600 bp in length (pArgl2.1 and pArgl2.2) and the others are both 200 bp variants (pArgl23 and pArgl2.4) (fig. 21). The 600 bp units differ from the smaller 200 bp ones by extra nucleotides in the 5'-flanks, but all four duplicated units are virtually identical in their 190 3' regions. The structural sequences of the tRNA A r S genes are also identical, but bear a characteristic T13 - C13 mutation that distinguishes them from all other members in the genome (Newton, unpublished observations). The above organizational pattern of the tRNA A rS genes would thus strongly suggest that the gene cluster arose from repeated duplication of an ancestral gene. The most likely mechanism would involve unequal exchange at those direct repeats demarcating the duplicated units (at least for one of the duplication events). Therefore, even though the evolution of the tRNA4/7Ser and the tRNA A r S genes may involve two distinctive mechanisms, both are most likely to be related by the same triggering event of local mispairing of DNA. The degree of fluidity at the t R N A A r £ gene cluster was first measured by genomic Southern blotting experiments on commonly available laboratory strains of melanogaster (Newton et. al, 1987). It was shown that 40 of the 45 strains (from Europe, North and South America, Asia and Africa) have the four duplicated t R N A A r § genes arranged as described above; while the rest have only three copies of the genes, with one of the 600 bp units missing. However, no copy number lower than three has been found in these strains. All sibling species, except sechellia and orena which have not been measured, showed only a single gene at the homologous sites. These encouraging results thus prompted an extended study of their organizational patterns, at the DNA sequence level, in the more distantly related sibling species erecta and yakuba. Their corresponding DNA segments have already been cloned from the earlier studies (Chapter II). From the previous examination of tRNA$ e r genes presented in Part II of Chapter II (sections C and F), XDE16 and 31DY16-82 also harboured pDt27R homologies at one end of the clones (fig. 33 and 39). Blotting experiments with tRNAArS-specific oligonucleotide probes (Arg5' and Arg3') have confirmed the presence of these genes associated with BamHl sites. However, in both of these species, only one tRNA A r « gene has been identified at the homologous locations, in agreement with the previous Southern analyses. For each of these solo genes, there is a small sequence resembling the melanogaster repeats 3' to the gene, but insufficient sequencing data in the 5'-end preclude the identification of a similar upstream 191 copy. However, the high degree of sequence conservation in the 5'-flanks of the tRNAArg across species would suggest that a similar copy should exist. The results would thus strongly support the notion that the present day tRNAArg gene cluster in melanogaster has been derived by unequal exchange at the short repeats flanking the ancestral single unit. Even though the tRNA$ e r and the tRNAArg genes at 12DE have embarked on separate evolutionary pathways, their respective organizational patterns are probably rooted in the common triggering mechanism of regional DNA distortion. RESULTS tRNA Arg Genes in XDE16 From D. erecta Two oligonucleotide probes, Arg5' and Arg3'< were used to identify coding sequences for tRNAArg genes within the il clone. The former probe is identical to the coding strand from +3 to +20 while the latter is identical to the tRNA sequence from nucleotides +56 to +72. Blotting experiments revealed a single tRNAArg gene (designated as pDeArg-1) marked with the expected BamHI site "600 bp downstream from the two t R N A ^ r genes. Moreover, probing with the Arg3' also uncovered a 3" half of another tRNAArg gene (designate as pDeArg-6) associated with the left-most BamHI site abutting the JL vector arm (fig. 33) Sequencing experiments show the expected pDeArg-1 downstream from the tRNA4$er genes (fig. 48). However, the structural sequence of pDeArg-1 differs from its counterparts in melanogaster by having aT, ratherthan C, at nucleotide 13 within the Block A promoter. The 28 nucleotides immediately 5' to the structural gene are well conserved between erecta and melanogaster(%2%). This extent of homology defines the 5' limit coincidental with the 200 bp duplicated units (pArgl2.3 and pArgl2.4) in D. melanogaster Beyond this transition point, the 5' flanking sequence of pDeArg-1 shows about 70% homology with pArgl2.1 of melanogaster, The pattern of homology is just the reverse in the 3' flanking region, where significant matches (~80%) can only be detected near the small repeat and beyond, beginning at 55 nucleotides downstream from the gene (nucleotide +195 in pDeArg-1, fig. 48) with the same region of pArgl2.4 of melanogaster. The interstitial sequence block between the structural 192 +50 tttatcttgt aaaagggcct tacttttgct tacatttggg tgacatatgc +100 tgggaatttg ggcggcaatt gcgcaactGA CCGT6TG6CC IRRTGGHTRR + 150 GGCGTCGGflC TTCGGflTCCG flfiGHTTGCRG GTTCGflGTCC TGTCflCGGTC +200 Gtgatgctaa gtttttattt ttcgtaagcc caataaaata attccatggg +250 agattcccta gccaataacc ctttttgtgt aacctgagtg aggtaagcag ccatcccaac caattggcat a F i g . 48. Sequence of pDeArg-1 from D. erecta, The gene was cloned as separate BamHl fragments. Several isolates were independently picked and sequenced by using the universal sequencing primer F l . Thus, only sequence from one strand was obtained but confirmed several times from the independent isolates. The gene is depicted in capital letters and the characteristic BamHl site, GGATCC, is located at position 114 (dotted underline). The nucleotide T 1 3 is highlighted by a solid underline (position 91), rather than the C13 that is characteristic of the clustered melanogaster t R N A ^ genes. The 3 -oligonucleotide corresponding to the melanogaster flanking repeats is located at position 208 in the above figure (boxed). The other direct repeat upstream from the gene has not been sequenced. The region that is highly homologous to the 200 bp melanogaster duplicated units starts at nucleotide 51 and ends at nucleotide 238. The 5'- and 3'- flanking sequences of pDeArg-1 show the most homology with the corresponding regions of the two outermost genes in the melanogaster cluster, pArgl2.1 and pArgl2.4, respectively. The most rapidly diverging region corresponds to the 54 nucleotide sequence between the end of the tRNA structural gene and the 3' small flanking repeat (nucleotides 152 to 195). \ 193 +50 GGRTCCGRRG RTTGCRGGTT CGRGTCCTGT CRCGGTCGat tgtctaactt + 100 ttttttcttg tatacgacat ataatttcct tgagttctga atattacata ttttaltaat tcgtcaagcc Fig.49. Sequence of 3-end of pDeArg-6 from D. erecta The structural gene is displayed in capital letter. The BamHI site (GGATCC) marking the gene is at the 5*-end of the insert DNA (underlined). The 3 -flanking sequence of the gene shows approximately 75% homology with pArgl2.6 of D. mefanogaster'xu the pDt27R molecular walk. Several different isolates were sequenced using the universal primer, F l . Only sequence information from one strand of the DNA was obtained but the above were confirmed several times. 194 gene and the small direct repeat, which coincidentally delimits the 3' tail of all duplicated units in melanogaster, is composed of completely divergent nucleotides . Abutting the left-most BamHI site in X.DE16 is the 3' end of another tRNAArg gene, pDeArg-6 (fig. 33) Sequencing experiments demonstrate significant homology between this gene region with its melanogaster counterpart, pArgl2.6, a gene that placed at least 11 kb away from the tRNA4$er genes (established in molecular walk) (fig. 49). In addition, this tRNAArg gene is transcribed in the opposite orientation in the two species. It is not known whether the intergenic sequences have been deleted in R. erecta or if the tRNAArg gene has been brought to close proximity to the tRNA4 S e r genes by a simple inversion. Blotting experiments using probes derived from the pDt27R molecular walk in D. melanogaster failed to resolve this issue since they hybridized to many sequences in the D. erecta genome (data not shown, but see fig. 60 lane 6). tRNAArg Genes in XDY16-82 From R. vakuba The pDt27R homologous region of this phage was also analyzed for the presence of tRNAArg genes. Only one was identified (designated as pDyArg-1), again in association with a BamHI site approximately 600 bp downstream from the lone tRNA4^er gene as illustrated in fig. 39. Sequence analysis shows that it is identical to pDeArg-1, also retaining the "original" T nucleotide at position 13 (fig. 50). Comparison of the 5-flanking sequences between pDeArg-1 and pDyArg-1 show that the 49 nucleotides immediately 5' to the coding sequences have predominantly single base changes totalling ~23% in divergence. Beyond this point, the mutational events are mixture of deletions of 2-7 base pairs and extensive base substitutions in strings of at least 20 nucleotides in length. Similar to the case described in the R. erecta section, even though the more distantly located 5-flanking region of pDyArg-1 shows correspondingly more divergence but still maintains strong ("70% on average) homology with pArg 12.1 of melanogaster. The 3-flankspDyArg-l also shows strong homology with pArgl2.4 of melanogaster (80%) starting at the small direct repeat and beyond. The interstitial block between the structural 195 +50 ggatccttga tgatgtcttt aatattatta atgcactaac tltaagtata +100 aaataatgat taaataagta tgttaatgta aagcgaggtt tatctgcaat +150 atgagaaact atcatcaatg atagtcagct tacttacatg ggcgttacat +200 atgttcggaa tttcggacgt cgattgcgta actGRCCGTG TGGCCIRRTG +250 GRTRRGGCGT CGGRCTTCGG RTCCGRRGRT TGCflGGTTCG RGTCCTGTCfl +300 CGGTCGtaat gtcaagtttt tatttttcgt aatccccatt taataattgjt +350 agcccagtct ttttgtaacc tgagtggagt aagcgggaat agcaaccaat +400 tggcaaaccc aattgaaaga tttattggac ttttacatgg gtcttcttcc atggacgacg aatcaacatg tggctgccat c F i g . 50. Nucleotide sequence of pDyArg-1 from D. yakuba, The sequence of the structural gene is displayed in capital letters. The gene was cloned on separate BamHl fragments. Several independent isolates were picked and sequenced with the universal primers. The BamHl site, GCATCC, marking the gene is at position 212 (dotted underline), which is 3' to the anticodon TCG. The nucleotide T 1 3 within the gene is underlined. The putative termination signal is 11 bp downstream from the structural sequence. The region that shows strong homology (>80%) to the 200 bp duplicated unit in D. melanogaster begins at nucleotide 157 and ends at nucleotide 324. Beyond this region, the 5 -flanking and 3 -flanking sequences show 70% to 80% homology to the corresponding regions of the melanogaster genes pArgl2.1 and pArgl2.4, respectively. The most rapidly diverging region is the sequence between nucleotides 257 and 299, which is between the end of the structural gene and the direct repeat-like oligonucleotide sequence, TAGCCCA. This putative 3' direct repeat is highlighted by a box. 196 gene and the direct repeat is also composed of unique sequence. When this interstitial block is compared between pDyArg-1 and pDeArg-1, there remains some weak homology (~60%). The differences include many single base changes and small deletions. Thus, it appears the most rapidly diverging sequence in the three sibling species occurs between the 3-end of the structural gene and the direct repeat. Thus, in summary, there is only one tRNAArg gene in the pDt27R homologous regions in both erecta and yakuba. The structural sequences of the tRNAArg genes from both sibling species are identical but differ from the melanogaster counterparts at nucleotide 13 (C in melanogaster and T in the other two sibling species). Since the nucleotide T also occurs in all other melanogaster tRNAArg genes, except those at 12DE, it would suggest the ancestral gene was probably similar, if not identical, to those represented by the two sibling species. The 3-tails within the duplicated regions among the genes from the different species are not well conserved. Weak homology is still detectable for species derived from the same sibling species complex, but completely divergent if they are derived from the two different species complexes (e.g. yakuba vs melanogaster). However, sequence identity of about 80% is evident starting at the small direct repeat and beyond for all inter se comparisons. Insufficient sequence data in the 5'-flanking sequences of either pDeArg-1 or pDyArg-1 preclude the confirmation of a similar upstream small direct repeat. However, it most likely exists in these two genes as well, judging from the high degree of homology shared among the available flanking sequences in all three sibling species. Also a minor generalization stemming from sequence comparisons of the tRNAArg genes across species showed that their 3-ends immediately outside the genes to be the most rapidly diverging region, which is also the case with the tRNA$ e r genes reported previously. The presence of the small repeats surrounding the tRNAArg genes, well-conserved across species (at least for the downstream copies) and demarcating each of the duplicated cassettes in melanogaster, suggest that unequal exchanges at or near the repeats could be responsible for at least the first duplication step in the tRNAArg genes. Subsequent duplication steps 197 could occur anywhere within the duplicated units, and not necessarily restricted to the repeats, would still yield the same final morphology in their organization. In fact, duplication of the first tRNAArg gene would enhance unequal exchange by providing an increase in the length of homology and may explain the propensity for the higher copy variant in the melanogaster population (i.e. 90% four copies vs 10% three copies) and no two or one copy variant. The conserved existence of the tRNAArg genes flanking the tRNA4^ e r genes in the sibling species would undermine standard recombination for generating the hybrid tRNA$ e r genes. In contrast, both the appearances of hybrid tRNA$ e r and duplicated tRNAArg genes at 12DE could be explained by DNA slippage as the common triggering mechanism in contributing to the divergent evolutionary pathways observed in the two gene families. DISCUSSION 198 1. The Overall Molecular Organization of 12DE A chromosomal walk has been conducted in the polytene chromosome bands 12DE in D melanogaster using both cosmid and Jt genomic libraries. A total of eight tRNA$ e r47 genes have been collected from this chromosomal region that is at least 157 kb in size. A compendium of these genes and their sites of origin on the molecular map have been tabulated in Table V. While there are two each of the 444 and 777 genes containing sequences predicted from their tRNAs (Cribbs, 1982), there are three other hybrid structures that are composites of the tRNA47Ser genes (two 774 and one 474). The remaining is designated as 444* found in the pDU7R molecular walk, It is characterized by a C50 to T50 mutation at the tip of the extra arm, while its three diagnostic nucleotides remain indicative of tRNA4Ser. Nestled among the tRNA$ e r genes are six tRNA^rg genes (Newton et. al, manuscript in preparation and fig. 51) Of these, five are encoded within the pDt27R domain and the remaining one is ~200 bp upstream of the 444* gene in the pDtl7R molecular walk. The overall molecular organization of the tRNA genes at 12DE would thus conform to other previously studied clusters where two or more families of tRNA genes have been found to co-exist. In this case, the direction of transcription for the X-linked genes also appears to be completely random. However, there are two unique properties associated with the organization of the tRNA4jS e r genes not found in other clusters. First, this is the only known gene cluster that harbours two different isoacceptors for the same tRNA; whereas all other clusters reported so far, house but a single isoaccepting species for any tRNA. Second, the tRNA4$er and the tRNA7$ e r recognize two non-overlapping sets of codons and are thus functionally distinct. Yet, their genes are extremely similar in sequence. These two unique properties may facilitate interactions, such as conversion or reciprocal exchange, between the two gene types to produce hybrid sequences. This latter point is discussed more fully in section 2 below. All of the tRNA4(7$er genes in the D. melanogaster genome have been previously TABLE V - A Summary of tRNA Genes Identified in Bands 12DE 199 Domain pDt73 Size (kb) ~33.5 Genes 474 pDtl7R "45 444*,774, 777 tRNAArg (T13) pDt27R ~56.5 444,444 one tRNAArg (T13),four tRNAArg (CI3) pDtl6 ~22.5 777,774 200 identified as either eight EcoRI or seven Hindlll restriction fragments by Southern blotting (Cribbs et at, 1987b). Approximately half of these fragments from either restriction digests have been assigned as X-chromosome in origin based on their weaker intensity of hybridization with DNA prepared from the male (one X chromosome) relative to that from the female (two X chromosomes). All of these X-linked tRNA$ e r restriction fragments were accounted for in the molecular walk. The results are summarized in fig. 51b. Although minor variants of genomic clones were occasionally obtained, the only thoroughly substantiated polymorphism detectable is in the clone 420R, where the Hindlll Drosophila insert is ~17 kb rather than the smaller 6.5 kb found in the original pDt27R (fig. 51. lane 5). This larger Hindlll fragment probably represents a high frequency variant, since it has been repeatedly observed in genomic Southern blots in several different fly strains (data not presented). Thus, it is likely that of the approximately 12 tRNA^Ser genes estimated in the genome (Cribbs et at, 1987b), only eight genes exist at 12DE. This chromosomal region appears to be well under-represented in all genomic libraries used in the experiments. With the exception of the pDt73 molecular domain, most Jt or cosmid clones occur at 2.5 to 5% the frequencies expected of impartial representation (Bender et al, 1983: Kaiser and Murray, 1985); further, impasses met in one library were often encountered repeatedly in all others as well. The reason for the under-representation of sequences is unclear, but it may be due to the high density of repetitive elements present at 12DE that are poorly tolerated by the more commonly used £ «>//'hosts. Some of these repeated sequences are probably non-essential for viability, since those tested are either entirely absent or highly variable in numbers in the different Drosophila sibling species as revealed by genomic Southern blotting. For example, the 10 kb Hindlll+EcoRI fragment between coordinates 30 and 38 in the pDt27R walk contains sequences that are moderately repeated in melanogaster, simulans, mauritiana and erecta, but highly repeated in both teissieriand yakubaXdaia, not presented). Also by direct molecular cloning, it has been demonstrated that the intervening sequences (at least 20 kb) separating the pDtl6R and pDt27R domains in melanogaster are absent in both I clones derived from erecta and the yakuba (figs. 33 and 201 Fig. 51. tRNA47S e r and t R N A A f 8 genes at 12DE, (A), h and cosmid clones with the least overlap representing the 157 kb from 12DE are shown. The recombinant clones have all been cleaved with Hindlll and resolved by electrophoresis in a 0.7% agarose gel. Lanes 1-3L736, lane 2=31731 and lane 3-31739 are from the pDt73 walk; lane 4-312161R is from the pDtl 6R walk; lane 5-420R. lane 6-cosP273R are the from pDt27R walk; lane 7-3L1731R, lane 8-31731. lane 9-311722 are from pDtl7R the walk. (B). The DNA from the gel was transferred onto a sheet of Hybond filter and probed with GT7, which is specific for tRNA4 7 S e r genes. The hybridization signal in lane 2 (-4.7 kb), lane 4 (-8.0 kb), and lane 8 (-10 kb) correspond to the Hindlll Drosophila inserts cloned in pDt73, pDtl6R, and pDtl7R respectively. The hybridzation signal in lane 5 is a-17 kb polymorphic fragment corresponding to pDt27R which has been encountered in a number of non-isogenic strains. The 1.8 kb hybridization signal in lane 6 is a small fragment overlapping into the tRNA-j581" gene in pDt27R. One end of this fragment contains the Hindlll site from the polylinker cloning site in the cosPneo; the other corresponds to a real Hindlll site in the genome. (C). The same filter hybridized to Arg3'. The only signals clearly detectable are in the pDt27R (lanes 5 and 6) and pDtl7R (lane 8) regions. Some faint bands are due to background from over-exposure of the film. Only the tRNA S e r genes are detectable with total 4S hybridization suggesting that tRNA^s genes may be under-expressed in viva 202 CN o ' ~d I I I i n o CN CN I I •O d aYHQd oo avztad a9i*ad •« c/tad tN aYUrjd oo IX szztad a9nad ^ CO C/(Qd CN II t 11 I I 1 i t * • t « I I I I I I I I I 1 I I f ( I ( ( I 203 39). One unusual class of repeats that are also non-essential for viability but known to be functional have been encountered during the molecular walk in the pDt73 domain. They have been identified previously by Hardy and Kennison (1980) as a genetic unit, called Ste or stellate, playing a role in male fertility. Subsequent molecular cloning by Lovett et at (1983) and Livak (1984) showed that Ste is a multi-gene family consisting of 1.3 kb repeats that are tandemly reiterated "200 times on the X chromosome near polytene bands 12F and "80 times on the long arm of the Y chromosme. The copy numbers are highly variable for different melanogaster strains, and entirely missing in all four of seven sibling species tested (this report and also Livak, 1984). Their function remains unknown, but those on the Y chromosome may also play some negative regulatory role controlling the expression of the X-linked copies. It has been shown that the poly A* RNA homologous to a Ste cDNA clone are 30 to 70 times more abundant from XO testes or from animals carrying Y deficiencies deleted for the presumptive regulatory region, than from XY testes. The high levels of RNA homologous to the 12F sequences are exactly correlated with the appearance of star-shaped crystals in the testes (and hence the name) and sterility in the male. The four domains obtained by molecular walking presented in Chapter I have not been successfully joined as a contiguous chromosome segment, and thus the overall organization of the t R N A S e r and tRNAArg genes within the polytene bands 12D to 12E2-3 remains unknown. However, a number of inferences can still be derived from several related observations. From the pDt73 molecular walk, it is known that the 474 gene is approximately 9.0 kb away from the Ste sequences, which in all probability occupy segments 12E-12F on the polytene chromosome as a continuous block (Chapter I and Livak, 1984). Previous in situ hybridization using purified tRNASer has demonstrated that all coding sequences are localized distal to 12F (Hayashi et al., 1980). Hence, the juxtaposition of pDt73 and the Ste sequences in the molecular walk indicates that the 474 gene is almost certainly to be the most proximal in the cluster (closest to the centromere). The other two domains, pDtl6R and pDt27R, could be adjacent in melanogaster, a possiblity that is intimated by the spatial conservation observed in both D. erecta (JLDE16) and yakuba (3LDY16-82) despite their divergence approximately 16 204 F i g . 52. The current progress i n the assignment of the X-l inked tRNA genes to polytene bands. The line at the top of the figure is the genetic map of the X chromosome with well known markers as indicated (y-yellow; tv-crossveintess; v- vermilion; g-garnet; f- forked). The dark circle to the right of the genetic map represents the centromere. The tRNA genes are proximal to the genetic marker garnet, which is located at 12B. Below is the expanded area near garnet as seen in the polytene X chromosome (redrawn and modified from Waring et. al., 1983). By molecular walking and by the copy number estimation of the Stellate sequences by Livak (1984), these testis-specific genes are proximal to the tRNA gene cluster and exist as a large block of tandem repeats totalling approximately 260 kb in size between 12E and 12F. The following deficiency mutants are in the process of being analyzed by both in situ hybridization and genomic Southern blotting. Df(l)g'fB deletes segment from bands 12A to 12E; HA92 deletes segment from bands 12A&-7 to 12D; KA9 deletes segment from bands 12E2-3 to 12F/13A; RK2 deletes segment from bands 12E1 to I3A2-5- Only Df(l)g'fB is known to delete the entire tRNA cluster as well as removal of a large proportion of the Stellate sequences. The two small deficiencies, HA92 and KA9, do not show a decrease in hybridization intensity as determined by in site hybridization with t R N A ^ (S. Hayashi, unpublished) and Southern blotting with cloned probes (data not shown) relative to the 23E site on the autosomal arm 2L. The mutant strain Df(l)g'fB/RK2, which is homozygous for the deletion from bands 12E1 to 12F1, has been constructed by Dr. D. A. R. Sinclair. Hybridization of pDt73 and pDtl7R to genomic Southern blots prepared from this strain failed to show any signal. However, re-hybridization of the same filter to pDt27R and pDtl6R showed intense hybridization of the expected sizes (Sinclair, unpublished). These results would strongly suggest the order of the plasmids containing tRNA genes from proximal to distal along the X-chromosome is pDt73, pDtt7R, [pDtl6R and pDt27R|. The above results, along with those obtained with the strain HA92 from the in situ hybridization experiments, would thus suggest that the tRNA genes are probably located at or very near band 12E1. The Df(l)g'fB/RK2 strain also removes >90% of the Stellate signal from the earlier identified X-linked site (Sinclair, unpublished), but the flies survive to adulthood. This would imply that almost the entire 12E chromosome segment contains no essential genes for viability. The adult females, Df(l)g'fB/RK2, are sterile however, and it is suspected that a female sterile mutation is located near one of the tRNA genes, possibly proximal to the tRNA gene cluster. Previously, a female sterile mutation has been genetically assigned to this region (Waring et. at., 1983). 206 million years ago, and by deletion analyses employing various mutant melanogaster strains (fig. 52). However, this remains a reasonable conjecture only, resting on the assumption that the two domains have also remained sequentially conserved and not been translocated elsewhere by inversions during D. melanogaster evolution. Using several deficiency mutants deleted for small segments within 12DE, the combination of in situ hybridization and Southern blotting experiments further suggest that all coding sequences are distal to band 12E2, and possibly within band 12Ei (Leung, Hayashi and Sinclair, unpublished observations). The current status of the mapping studies has been summarized in fig. 52. 2. Co-evolution of the tRNA4, 7 S e r ^ e n e s Of the 82 nucleotides in the coding sequences between tRNASer a n £ | tRNA7$er genes, they only differ from each other at positions 16, 34 and 77. As alluded to earlier, there are several permutational forms that are strictly altered at these three possible positions (in addition to the modified 444* gene). Changes at these sites are significantly non-random, with substitutions only resonating between nucleotide that are diagnostic of either bona fide genes. Such predictability in the observed mutations was the mainstay that inspired Cribbs (1982) to hypothesize that the tRNA^jSer genes are co-evolving. It was further proposed that the hybrid sequences, arose from non-reciprocal recombination between the two parental genes, to be forceful testimony to the existence of such adynamic evolutionary process (P.160 of Cribbs, 1982). If such a process were indeed dynamic and continually renovating the tRNA$ e r genes at 12DE, then a convincing biochemical demonstration of non-reciprocal recombination as the underlying mechanism would be to isolate alternative permutations of the same tRNA^Ser genes. Thus it was disappointing to find that equivalent genes represented by pDt73 (474) and pDtl6R (774 and 777 genes) isolated from different fly strains and species have failed to directly support this contention since the corresponding genes are all identical in sequence (Chapter I). Even though the structural sequences themselves have remained static, the strikingly non-random distribution of the flanking homology patches, both in sequence 207 content and in spatial alignment, threaded among the tRNA$ e r genes at 12DE do indirectly support the notion that the hybrid genes are likely to be decended from multiple lineages (Chapter II, Part I). Further, the morphology of these homology patches is reminiscent of intrachromosomal conversion events (Baltimore, 1981; Cami et at., 1984). Statistically, each small homology patch could have independently arisen by a low chance occurrence approximately once in 1000 to 65,000 nucleotides (5 to 8 matches, respectively). However, in the pair-wise comparisons of 5' region of less than 100 base pairs. I have observed more than one homology block. More importantly, these blocks of homology are never scattered in random array; instead they are almost perfectly aligned in a linear order without any artificial intervention of major sequence distortions (e.g. large insertions or deletions, loopouts, inversions etc.). Such homology patterns are, moreover, not the exclusive properties of one gene pair, but have been a recurring theme involving almost all tRNA S e r genes at 12DE (pDt27R 444-2 being the only exception). The intraspecific pattern of flanking sequence homologies (sequence homogenization in Dover's parlance) in the evolution of multigene families, fuelled mainly by biased gene conversion, is central to the theory of Molecular Drive (Dover, 1982). If indeed the shared homology patches observed in the tRNA^ e r genes were generated by a similar mechanism, based on the prediction of Molecular Drive, then they should also occur in only one species and the identical regions of the homologous genes from other species should take on another set of shared homology patches. This has been directly tested by comparing homologous tRNASer genes across species. Two different types of homology patterns in the 5-flanking region have been observed in the tRN A S e r genes. The first type (type I) is the homology boxes at -5 and -20. These are well conserved in sequence, spatial distribution, gene-type specificity (that is, only associated with either the 444 or the 777 genes) and they transgress species boundaries. Since sequence conservation in 5-flanking regions of tRNA genes is rare (see Introduction), it is tempting to suggest that the existence of these homology elements may reflect a functional selection for their role as important modulatory signals in the cell. More pertinent to this study is the second type of homology pattern (type II) associated 208 with the hybrid genes. Their occurrence is more difficult to reconcile based on the premise of functional selection alone. These homology patches are extremely heterogeneous in sequence, in size and in the spatial distribution within the 5-flanking regions. More importantly, these homologies do not usually transgress species boundaries unless they also overlap with the -5 and -20 elements. What are the origins of these homology patches? This can be best understood if the focus of attention is shifted to the comparison of homologous genes from the different species. When their 5'-flanking sequences are aligned, random mutations scattered sporadically throughout their lengths are readily apparent. Superimposed on top the background mutations are regional occurrences of clustered base changes. These clustered changes are not generated de novo by random mutations, but show homology to a high degree in both sequence content and positional alignment with another non-aileiic tRNA^e1* gene in the same genome (or species). For further clarification, a specific case is discussed below with the aid of an illustration. 444 474 mel [ DNA TRAFFICKING CLUSTERED CHANGES CLUSTERED CHANGES ere DNA TRAFFICKING If the homologous 444-1 (black bars) and the 474 genes (open bars) are compared between erecta and melanogaster, both sets of genes show clusters of base changes in the immediate 30 base pairs 5' to the structural gene (shown as either open or filled small retangular patches 209 5' to the genes). When the 444-1 gene is now juxtaposed against the intraspecific 474 gene, what were originally identified as clustered base changes between homologous genes are now almost perfectly aligned in sequence and positions between these non-allelic genes (illustrated as similar rectangular patches). It is unlikely that these flanking sequences from homologous genes have undergone such substantial mutational alterations during speciation and yet, within a species, have remained conserved. Parallel evolution of such precision in the gene pairs compelled by random mutation alone would thus seem weak empirically, not to mention unlikely on the grounds of mathematical probability. I propose instead that the clustered base changes above the background of random muatations are regional perturbations introduced by sequence trafficking between non-allelic genes within the same genome (see section 4 for possible mechanisms), in this specific example, between the 444 and 474 genes. A similar argument can also be tentatively made for the 774 gene in pDtl6R. Relative to the 474 genes above, the data here are less dramatic because the homologous pDtl7R-777 genes have not been cloned from D. erecta z&d D. yakuba. However, the patterns of homology as revealed by the 774 genes from three different species trade on a familiar theme: first, inter-species comparisons of homologous 774 genes showed coincidental clustered base changes; second, in the documented case of melanogaster, the replacement of the clustered changes show sequence alliance with the intraspecific pDtl7R-777 gene. Again in this regard, it is unlikely that this clustered base changes in melanogaster 774 gene could simply be de novo, but the replaced block of sequence is consistent with DNA information trafficking from another gene at 12DE. In the 3-flanking regions of the melanogaster tRNASer genes, only a few members show significant level of sequence homology. Their scarcity could indicate that only a small number of "donor" genes were involved in transmitting the 3' sequence information to the different hybrid genes (see fig. 53 and 54). An alternative or additional reason could be that the 3'-flanks invariably show faster rates of sequence divergence, particularly the interpolated region between the 3'-end of the structural gene and the poly-T termination 210 signals as ascertained by comparisons of homologous genes between different species (fig, 46). As a result, any evidence suggestive of common descent could become quickly obliterated. Finally, there is another interesting correlation which supports coevolution of the genes via gene conversion. The changes at nucleotide 16 and 77 internal to the genes accurately presage the type of immediate flanking sequences in the hybrid genes. It appears that both types of internal and external nucleotide changes are always inherited as a unit as if they were products of coconversion. In summary, the following parallel lines of observations do forge strong support to the argument that the X-linked tRNA4 (7^er genes are probably co-evolving as a cohesive unit. Further, the overall observations from sequence comparisons are in keeping with the central framework of Molecular Drive, the intraspecific sequence homogeneity in multigene evolution driven mainly by biased gene conversion: (i) The permutational nature of the hybrid genes suggest that the internal base changes are non-random. Except for the 444*, all nucleotide replacements within the genes strictly fluctuate between those of either tRNA4$e r or tRNA7^er. (ii) The flanking regions of the hybrid genes show patchwork homologies characteristic of known tRNA4,7 S e r genes at 12DE. (iii) Particularly significant are the 5' and 3' segmental homologies in D. melanogaster t R N A S e r genes, which show sequence alliance with a non-allelic gene and do not usually transgress species boundaries (i.e. conforming to the concept of sequence homogenization). (iv) The multiple type II homology blocks between non-allelic genes can be aligned in a linear order without artificial assistance by distorting the sequences. Moreover, such patchwork homology patterns have been a recurring theme involving almost all tRNA$ e r genes at 12DE (pDt27R 444-2 being the only exception). (v) In the non-allelic gene pairs that show parallel clustered base changes, one member is consistently a bona fide gene, the other a hybrid gene. Such a relationship is reminiscent of a donor and a recipient, respectively, participated in the presumed conversion event. (vi) The inheritance of the immediate flanking sequences in the hybrid genes is always 211 coupled to expected nucleotide changes at positions 16 and 77 internal to the genes. One may still argue that any one of the preceding steps could have arisen fortuitously from random mutation followed by the vagaries of drift and selection. Selection as the all encompassing force would be difficult to reconcile, since the type II homology patterns are a conglomerate of heterogeneous sequences with no distinctive features. Moreover, the concatenation of such individual (and independent) chance events, nucleotide by nucleotide and coincidentally in different species, somehow leading inexorably to the consistent patterns observed would be exceedingly unlikely. A simpler and more compelling hypothesis would be that the tRNASer genes at 12DE are co-evolving as a unit and that the type II homology patches are indicative of active DNA trafficking involved in such an evolutionary process. 3. A Model Postulating the Origins of Tvoe II Homology Patches The foregoing observations seem to naturally consolidate into a simplistic yet coherent scheme delineating the possible origins of most of the hybrid genes (including the 444*); although in all likelihood, it addresses only the most recent events in their long history of coevolution (at least 13 to 37 million years) (fig. 28). This scheme invites the lowest number of possible information transfer events (for now, the events could be interpreted as either recombination or gene conversion). For clarity, I have explained the scheme in two separate figures. The first, as shown in fig. 53, is a genealogy tracing the origins of most of the hybrid genes. Each of the steps involved, indicated as A (to form the 474 gene), B (pDtl7R-774), C or D as equal possibilities (pDtl6R-774), and E (444*) is explained in detail in fig 54. The repertoire of hybrid genes cascading from these interactions appear to impinge on the pDtl7R-774 gene as the crucial intermediate (fig. 53). However, the origin of this intermediate remains difficult to assess. For simplification, I have occasionally used the general term "recombination" below as an operational description that includes all exchanges of information between genes, and not as a direct inference to the actual mechanism involved. In section 4, the possible mechanisms 212 pDU6R-774 E D 444* <=• 444-1 X pDU7R-777 X 474 E = > pDtl6R-774 X B A pDtl6R-777 X ? E=> pDtl7R-774 X 444-1 E=> 474 F i g . 53. Genealogy delineating the formation of the t R N A ^ genes at 12DE. In all likelihood it only traces the most recent events in their transmission of DNA information. As shown by the diagram, all type 11 segmental homologies surrounding the tRNA S e r genes can be traced to pDtl6R-777 and an unknown tRNA^genes. A, B, Cor D as equal possibilities, and E symbolize the steps leading to the formation of all the known hybrid genes, and the 444* gene. These steps are explained in detail in fig. 54. Open arrows indicate the direction of, and the products derived from, the information transfer events. Dotted arrows indicate possible additional flow of information that has obscured the origin of the hypothetical 444 gene. 213 will be discussed in detail. In fig. 54, the following steps are explained: -Step A. the 474 gene could be created by information transfer events involving the pDtl7R-774 and 444-1 genes. Even though the 474 gene could be explained by a simultaneous double recombination event involving the 444-1 and one of the 777 genes, this seems to be a less likely explanation. From such a double recombination event, both the immediate 5 - and the 3- flanking regions of the 474 gene would be expected to display sequence alliance with the 444-1 gene. However, in the present case, the immediate 3'-flanking region of the 474 gene shows stronger homology with the pDtl7R-774 gene than with the 444-1, thus suggesting that the formation of the final recombinant occurred via two or more steps, rather than as simultaneous events. -Step B: As mentioned above, the origin of the pDtl7R-774 gene from above is probably more complex. Since this is the first intermediate in the scheme, it would be logical to assume that it may have been derived from recombination between a 777 and a 444 gene as the initiating event. While its 5-flanking region would suggest that the pDtl6R-777 gene could be involved, its 3-flanking region shows no homology with either of the known bona fide tRNASer genes. One possible scenario could be an information transfer event involving the pDtl6R-777 and a 444 gene to form a transient pDtl7R-774. The 3'-flanking region of this product could have diverged and then subsequently acted as the donor in creating the 474 and the pDtl6R-774 (as discussed below). Alternatively, the present day form of the pDtl7R-774 3' end could have already undergone additional interactions with the 474 or the pDtl6R-774 genes. This would be analogous to the phenomenon observed for coevolution of the immediate 5'-flanking regions between the intraspecific 474 and its respective 444-1 genes described previously in Chapter II (Part I). -Steps C and D: the pDtl6R-774 gene could be generated by two equally likely lineages: first, pDtl7R-777 and pDtl7R-774 could be information donors to form the pDtl6R-774 (step C). Alternatively, the same hybrid gene could have been generated from information transfer between the pDtl7R-777 and the 474 genes (step D). The similarity in the immediate 3' regions of the 474 and the pDtl7R-774 genes would preclude any clear distinction between the two 214 F i g . 54. Schematic stepwise diagrams showing the possible lineages of hybr id tRNA4,7 S c r genes encountered at 12DE based on their shared f lanking homologies. Each of the steps (A. B, C, D and E in fig. 53) leading to the formation of a particular hybrid gene is illustrated. The recipient of the information transfer leading to the present day form of the hybrid gene is sandwiched between the putative donor sequences. The tRNA^- type genetic information is depicted as filled boxes and the associated flanking regions are depicted as thick lines. In contrast, the tRNA^- type genetic information is shown as open boxes and the flanking sequences are distinguished as thin lines. The pathways of information transfer are shown as square waves between the genes. The dotted part of the sequence waves indicates the possible extent of the regions involved. Step (A) shows the creation of the 474 gene. The donation of genetic information could be pDtl7R-774 and 444-1 genes in two independent steps rather than a simultaneous event. Step (B) attempts to account for the pDtl6R-774 gene. While its 5-flanking homology suggests that pDtl6R may be involved, the origin of its 3-flanking sequence is complex and may have undergone additional interactions with other hybrid genes further down the pathways. Steps (C) and (D) are two possible ways by which the pDtl6R-774 gene could be accounted for. In both possibilities, one of the donor sequences is pDtl7R-777, the other could either be pDt!7R-774 (as in step C) or the 474 gene (as in step D). Step (E) suggests that the 444* gene could have been formed by genetic information transfer involving pDtl7R-777 gene in the 5-end between -9 and -15. while its 3-end shows sequence signature of the 444-1 gene. The mutation at residue 52 is indicated as (*) and probably the result of a random mutation. (A) p0t17R-774 474 444-1 (B) pDt16R-777 pDt17R-774 444? pDt!6R-774? 474? (C) pDt17R- 777 pDt16R-774 pDt17R-774 (D) 474 p»t16R-774 pDt17R-777 (E) pDt17R-777 444* 444-1 218 possibilities. -Step E: the 444* gene could have formed by estrogenic information transfer event instigated by pDtl7R-777 at approximately between -9 and -15 in the 5'-flank. The structural sequence of the 444* gene, plus about 20-bp in the 3'-flanking region probably involve the 444-1 gene as the other donor. The single mutation at residue 56 within the coding sequence could be random, or the direct result of misincorporation during repair after the process of information transfer. Thus, with the possible exception of the pDtl7R-774, the rest of the hybrid sequences and the 444* gene could be accounted for, in the simplest way, by step-wise information transfer events involving known genes at 12DE. 4. Possible Mechanisms Involved in Generating the Hybrid Genes Heterologous recombination between tRNASer and tRNA7$er genes and/or slippage of DNA strands during replication (Jones and Kafatos, 1982; Goldsmith and Kafatos, 1984; Streisinger et. al, 1966), are two mechanisms by which the hybrid sequences might arise. Either mechanism, however, would also presumably result in fluctuations (expansion and contraction) in the size of the gene family due to the irregular spacing and orientation of the t R N A S e r genes. It is also expected that the intermingled tRNA A f S genes would be deleted in some instances as a result of straddling DNA slippage during replication or by unequal exchange. From limited analyses in D erecta and D. yakuba, no deletion for the tRNA4,7^er o r tRNA A r 8 genes has been detected at the representative loci. Thus, if the hybrid genes were formed from either mechanism, it would necessarily imply that gene dosage may act as an additional selection factor in eliminating segregants containing "improper" gene copy numbers or other associated deleterious defects. Furthermore, unequal exchange between the 444 and 777 genes should issue an equal number of reciprocal recombinants if there is no differential selection against either class of products. Thus far, only one of the two possible classes of recombinants has been observed (see below). While these two possible mechanisms for generating hybrid genes seem unlikely, they cannot be formally eliminated since the 219 absolute copy numbers of tRNA$ e r genes at 12DE have not been rigorously measured in the different Drosophila strains and species. Both gene conversion and double recombination would deserve equal merit as valid explanations only if the sequence structures of each hybrid gene were interpreted independently. Either mechanism could give rise to the hybrid genes and would preserve the number of gene members for both tRNA47^ e r and tRNAArg in the cluster. Two considerations, however, render double recombination as the weaker possibility. First, on theoretical grounds, double crossovers in the same chromosome are not independent events. It has been well established that crossing-over in one region will decrease the likelihood that another will occur in an adjacent region along the same chromosome. This phenomenon is known as "interference" (reviewed by Suzuki et al., 1986) and it refers to the observation that the number of double recombinants recovered by a genetic cross is usually lower than the predicted value based on the product of their individual probabilities. Interference is generally measured over 10 to 20 map units, but it has been surmised to hold true for short distances for a few hundred base pairs. If the hybrid genes were in fact generated by frequent double crossovers, then this tRNA gene cluster would be truly an unusual genetic site, by virtue of the fact that interference appears to be negated in this region to allow a large number of tightly linked double recombination to proceed unimpeded. A second, and more forceful argument against double recombination comes from the systematic cloning and sequencing of almost the entire tRNA4,7^er gene family. A model based on double recombination would also predict two classes of reciprocal recombinants. Since only one class has been found (see below), this observation would be antagonistic to double as well as single reciprocal exchanges as plausible models. In the absence of reciprocal products, this model would also necessitate other additional and unwarranted contrivances invoking differential selection on the two classes of reciprocal products with no real supporting evidence. A definitely more parsimonious explanation would be non-reciprocal recombination (probably by gene conversion-like mechanisms) as first suggested by Cribbs (1982). Non-reciprocal recombination is an operational definition used in describing the recovery of only 220 one of the two possible classes of recombinants in a given genetic cross. While it is still slightly premature to tell whether the entire caste of hybrid genes (and the 444*) are non-reciprocal recombinants, at least as far as is known, all seem to fit this description as suggested by three parallel lines of evidence (also see Cribbs et al, 1987b). From the molecular walk at 12DE, none of the anticipated reciprocal hybrid sequences (747, 477 genes) have been recovered, and yet in all likelihood, the tRNA S e r gene cluster from this major chromosomal site have been fully recovered. It seems equally unlikely that all reciprocal recombinants are located at the three minor autosomal sites, since three of the possible four genes from these sites have also been cloned as restriction fragments purified from gels and from recombinant libraries. Two 777 genes separated by a maximum of 15 kb have been isolated from 23E on chromosome 2L from a limited chromosomal walk of approximately 50 kb (J. Leung, D. Sinclair and S. Hayashi, unpublished). The other, as yet unmapped cytologically, has been isolated as a 2.0 kb Hindlll fragment identified earlier by genomic blotting. This is also a 777 gene (D. Sinclair, unpublished). Furthermore, no patchwork homologies can be found between these autosomal and any of the X-linked tRNA$ e r genes, except at the expected -5 and -20 elements (data not shown). The last and weakest hybridization band, identified as an approximately 24 kb Hindlll or 9.5 kb EcoRI fragment by genomic Southern (Cribbs et al., 1987b) and most likely to contain the remaining autosomal gene, has not yet been cloned. Even if this gene were one of the possible reciprocal recombinants, it still could not fully account for the three hybrid and the 444* genes captured in the walk. The third indirect line of evidence suggests that the homology blocks associated with the hybrid genes can be more simply accounted for by localized interactions of the X-linked tRNA$ e r genes without invoking recombination between the X and autosomes as discussed above. It is intuitively obvious, that such illegitimate exchanges would also be burdened by high frequencies of aneuploidy. In contrast, gene conversion-like events as viable a hypothesis would embody the twin virtues of providing a more literal interpretation of the non-reciprocality of the hybrid genes with minimum assumptions, as well as the import of flanking homologies while evading the consequences of aneuploidy or alteration in gene numbers. 221 Even though the data obtained in the above studies are circumstantial, with the almost complete characterization of the tRNASer g e n e structures in the genome, standard recombination (by either single or double exchange) would emerge as a less appealing explanation. In contrast, the structural sequences of the hybrid genes, their molecular organization as a tightly linked cluster and the patchwork flanking homology patterns do impart extensive and striking similarities coincidental with other multigene families that are thought to co-evolve by gene conversion (Dover, 19S2 and below). If the tRNA4,7Ser genes are no exception, they may conform to this general mode of cohesive evolution. It must be acknowledged though, gene conversion may be the dominant mechanism, but it does not in any way eliminate other mechanisms mentioned above as minor players in the co-evolution of the tRNA S e r genes. As called to attention by Baltimore (1981) and Egel (1981), co-evolving gene families usually show clustering, probably as a direct reflection on their common origin. From his surveys, Baltimore (1981) showed that allelic members of a gene family usually show strong conservation; but he also noted that base differences between alleles are usually clusters of several nucleotides. However, these clusters are never random mutations, but show very strong sequence identity with non-allelic members adjacent on the same chromosome. Baltimore (1981) and Egel (1981) thus further proposed that such segmental homolgy blocks may represent molecular evidence for gene conversion between non-allelic members of the family. Gene conversion would make intuitive sense as a mechanism to rectify deleterious mutations and to keep the multigene family (duplicated but irregularly spaced genes) evolving as a cohesive unit (Geliebter and Nathenson, 1987). Since that time, small imperfect homology patches have also been shown in other multi-gene families as diverse as human fetal 6-type globin (Slightom et. al,, 1980; Stoeckert et al., 1984; Powers and Smithies, 1986), the Gy2aand Cyb genes of the mouse immunoglobulins heavy chains (Olio and Rougeon, 1983), the K», Kd and Q10 mouse class I H-2 genes (Metier et. al., 1983) and the human embryonic £-globin genes (Hill et. al., 1985). The organization of all these genes conforms to the prediction of Baltimore (1981) in that they are all tightly linked within a small chromosomal domain. 222 With the exception of the human C, genes, all other studies represent fairly extensive analyses of allelic differences within polymorphic populations. In these surveys reported above, a large proportion of the differences between alleles exist as clusters of 4-6 nucleotides (analogous to the clustered base changes between homologous tRNA$ e r genes from the different species). These bases differences are also non-random, but display strong sequence identities with adjacent and non-allelic members of the gene family. These collective studies thus provide circumstantial evidence suggesting that gene conversion may be a pervasive mechanism for constraining sequence divergence between members of a multi-gene family. Their data further implied that the proximity of the non-allelic gene members may enhance their frequencies of intrachromosomal gene conversion either by sister DNA (chromatid) exchange or by looping within a DNA strand (Slightom et al., 1980; Egel, 1981; Baltimore, 1981; Stoeckert et al., 1984; Willard and Waye, 1987). Intrachromosomal gene conversion may indeed represent the predominant mode of information exchange between tightly clustered loci In vivo and this has been very persuasively simulated in yeast. Klein and Petes (1981) selected a yeast transformant, in which the wild-type lea 2* gene, together with a bacterial vector, was integrated in the vicinity of the chromosomal leu 2 gene that was previously inactivated by two frameshift mutations. This transformed strain was then crossed with a teu2* partner. Hence, one of the chromosomes of the diploid carried two leu2 genes in tandem only separated by the vector DNA. In the absence of recombination, all four spores of a tetrad should receive a copy of the teu2* allele and grow without leucine. However, occasional tetrads were observed in which one spore was auxotrophic. When a random sample of these aberrant tetrads were analyzed further (12 of 306), the largest fraction was due to conversion between the duplicated genes (6 of the 12). Four cases were due to conversion between non-sister chromatids, and only one case was due to crossing-over (the remaining tetrad was apparently the result of multiple events). Exactly parallel studies using duplicated his A genes, but in mitotic yeast cells, have led Jackson and Fink (1981) to the same conclusions as those above. However, their studies were even more compelling, since in a total of 127 aberrant colonies examined, at least 88% of 223 them could be traced directly to intrachromosomal conversion while the remaining 12% were pooled from more complex events (either reciprocal recombinants or convertants that have subsequently participated in recombination events). Both sets of authors, based on their independent results, have also stressed that intrachromosomal conversion may be the dominant and effective driving force in sequence rectification during both meiosis and mitosis since the overall number of genes within a family would be faithfully maintained. Intrachromosomal conversion being the major pathway of information transfer is certainly not just an esoteric refuge restricted to fungi. The same type of mechanism has been recently advanced by Liskay and Stachelek (1983 and 1986) and concurred by others (Lin and Sternberg, 1984; Smith and Berg, 1984) to be also operative in mammalian cells using direct duplications of HSVitf " (type I Herpes Simplex Virus) inserted into different chromosomal positions as a model. Events consistent with nonreciprocal information transfer, or gene conversion were found to make up a majority (50-85% depending on the systems investigated) of the total recombination events (Liskay et at., 1984). These experiments have one undeniable advantage over the fungal studies, because the integration of the plasmids is not targeted to a specific site, the effects on recombination from different chromosomal environments can be compared. The collective studies strongly suggest that positional effects play either a minor or no role in affecting the frequency of conversion and that the rates are largely, if not entirely, dependent on sequence homology of the inserted duplicated fragments. Although interchromosomal conversion cannot be absolutely eliminated in these studies, most of the cell lines have been ensured to contain only a single integrant (as determined by Southern blotting) to minimize this effect. Thus, the above conjoining investigations in higher eukaryotes and in yeast seem to reinforce intrachromosomal conversion as a widely employed mechanism in maintaining sequence homogeneity in tightly linked multi-genes. The molecular workings of gene conversion remain the most resolute problem, although several models have been advanced in recent years in an attempt to unwrap its "black-box" mystery. These models ranged from the simplistic (single DNA strand invasion of Holliday, 224 1964) to the more exotic (D-loop formation and branch migration of Meselson and Radding, 1978 and double-stranded break and resolution of multi-branched structures of Szostak et al., 1983); each embodies certain inherent peculiarities stemming from the unique properties of the systems under study. Yet, gene conversion in all its likelihood, is probably a complex phenomenon with different aspects being exaggerated by the choice of genes and nature of the mutations, cell cycle time, the genetic background and the biological systems employed (Fogel et al., 1978; Fogel et al, 1982; Herskowitz and Oshima, 1981; Hamza et al, 1986). All of the above models, however, do converge on a unifying theme invoking heteroduplex formation as the intermediate. The ensuing repair of the heteroduplex would thus leave a wake of patchwork sequences upon its resolution. This concept has been tested by constructing heteroduplexes of H-2 genes from mouse in vitro with two partial cDNA clones of 1.15 kb and 1.0 kb in length (Abastado et. al., 1984) and transforming them into either £ coli (dam' and either reck' or recBC' strains) or Cos -1 monkey cells (Cami et al 1984). The cDNA clones differ by a large insertion at the 5' end (142 bp) and in the length of the 3' poly-A tails, in addition to many internal point mutations, small insertions and deletions of 3 - 9 bp totalling 8% in sequence divergence (small insertions and deletions had been scored as single mutations). The subsequent resolved heteroduplexes recovered from both cell types indeed acquired blocks of processed patchwork copied from either parental strands, and never de novo synthesis, as predicted from the above models. Furthermore, they also drew a tentative correlation between the decreasing lengths of the patchworks with an increase in the amount of heterology. That is, in regions of many nucleotide changes, the repair mechanisms showed no template preference, but made frequent switches between the two strands (Cami et. al, 1984). These observations have been confirmed by Folger et al (1985) by transforming heteroduplexes of different insertion mutants of the neomycin genes into mouse lMt£~ cells. They also persuasively showed that repair was in fact the predominant mode of generating such processed patchwork, since co-injection of a mixture of different homoduplexes into the cells failed to produce any patchwork or even simple recombinants of any design. Such a repair process, at least in £. coli, can occur prior to, and hence independent of, resolution of 225 the heteroduplex through DNA replication (Fischel et. al., 1986). Further, such correction of heteroduplex is severely impaired in yeasts bearing the psm-1 mutation, a gene normally thought to participate in mismatch repair (Bishop et. at., 1987). According to the recombination models, sequence homology is tantamount to the formation of recombination intermediates and subsequent branch migration. Several investigations have been reported on the minimum length of homology which can still catalyze efficient recombination or gene conversion between directly duplicated sequences in higher eukaryotes. Liskay et al., (1987) constructed a series of plasmids, each containing a mutant target RSVtJr' gene inactivated by an Xhol linker, and a donor fragment sharing various lengths of homology overlapping with the mutant site. Upon transfection into mouse L cells, the integrated constructs would contain directly duplicated I f sequences separated by several kilobases from the vector. Their results showed that, for shared homologies between 1.8 kb and 295 bp in length, conversion is efficient, with the rate being directly proportional to the extent of homology (between 9 x 10"* and 2.0 x 10~6 conversion events/ cell division). In contrast, conversion with either 200 bp or 95 bp of homology was still detectable, but the rate was reduced at least seven- to 100-fold, respectively, relative to that observed with 295 bp donor fragment. A similar approach was taken by using an SV40-pBR322 plasmid construct also containing directly duplicated fragments flanking a single SV40 genome (Rubnitz and Subramani, 1984). Recombination between the duplicated fragments would thereby liberate viable SV40 virions in the process. Since SV40 DNA has a specific infectivity of 3 6 x 106 PFU/mg, a specific infectivity, or recombination frequency, of only 0.001% would still produce plaques. By progressively trimming the length of homology from 5,243 to 0 bp with Bal3l,they showed that the steepest drop in recombination frequency occurred between 163 and214 bp (030% relative to wild-type), with lower but persistent levels of recombination occurring when there was only 14 bp of homology (0.002% comparing to wild-type SV40). In prokaryotes, the region of homology required can still be shorter. It has been shown in vitro that reck protein from £ coli is able to extend hybrid-DNA formation through regions of DNA with an average of 1 mismatch per 10 bp (DasGupta and Radding, 1982). 226 Another unusual feature concerning recognizing UCN codons is the evolution of their respective genes appears to be cohesive across the eukaryotic kingdom. Thus far, these are the only recorded examples of pairs of tRNAs with almost complete sequence identity that are yet functionally distinct isoacceptors. In all cases, they are 96-98% in homology, as completely determined in S. ceneraaiaKEtcheverry el at., 1979), 5. pombe (Rofer et at. 1979), D. metanogaster(Zt\bb% et at., 1987a). Incomplete sequence data indicate that such a pair may be present in the rat as well (Rogg and Staehelin, 1971; Rogg et at. 1973). The concerted evolution of the genes encoding UCN reading tRNAsS e r has been extensively studied in S. pombe in a series of elegant experiments conducted by Kohli and colleagues. The family of genes coding for two minor serine tRNA isoacceptors consists of three members. Two genes (supV on right arm of chromosome I and sup 9* on chromosome III) code for tRNAs reading the codon UCA and one gene (sup 12*, on the left arm of chromosome I) codes for a UCG-reading tRNA (Munz et at, 1981). The sequences of all three genes are very similar in the regions corresponding to the mature tRNAs. Besides the obvious base differences at the anticodon, the only difference resides at the tip of the tRNA extra arm (sup 3 :T; sup 9 and sup 12:C). All three genes have introns. While those of sap 3 and sup9 are identical and 15 bp long, that of sup 12 differs from the two at six positions and is 1 bp longer. The intergenic convertants have been obtained by three different clever schemes (Amstutz et. at., 1985; Munz et al, 1982; Heyer et at., 1986). All three were designed to independently estimate the frequencies of conversion dxlO" 6 to 2xl0~5) and as indications of the possible minimum lengths of the repaired heteroduplexes. For example, most crosses consist of the selfing of strains carrying a suppressor gene that has been inactivated by a secondary mutation outside the anticodon (Munz et at, 1981). The parents of such a cross are identical throughout the genome (except for mating type) and consequently recombination between alleles cannot lead to the creation of active suppressors. Among the progeny, sequence analysis performed on convertants from all three schemes has shown that all three loci were involved in DNA transfer and that information trafficking between any two members of the gene family occurred in both directions. Moreover, with a more relaxed screen that did not demand 227 functional tRNAs$e r (either as suppressors or wild-type) some of the convertants contained imported DNA patchwork sequences that can be as short as 18 bp (Heyer et al., 1986). Thus, in returning to the Drosophila tRNASer genes at 12DE, the constraints on their genie sequences and the accountability of their patchwork flanking homologies could be explained in context of what is known about co-evolution of multigene families discussed above. The tight linkage between tRNA4 S e r and tRNA7 S e r would suggest that on occasions these genes could mispair due to the similarity of their coding sequences. Their length and degree of homology should be sufficient for stable heteroduplex formation as supported by the various published results cited above. Also, this contention is strengthened by the strong cross-hybridization observed between the two gene types even under stringent conditions either by in .aft/hybridization (Hayashi et al., 1980) or by Southern blotting experiments observed during the course of this work (also consult Cribbs et al. 1987b). It is also possible that the surrounding repeats could act as accomplices in assisting the formation of heteroduplex. This type of phenomenon is not entirely novel, but it has been documented previously as aberrations at the white locus caused by misalignment and high frequency recombination between adjacent roo (Davis et al., 1987) and between copia (M. L. Goldberg et. al., 1983) transposable elements that are at least 38 and 60 kb apart, respectively. For the transient heteroduplexes of the misaligned tRNA4jSer genes, they could then be partially repaired using either DNA strand as a template, leaving a wake of processed patchwork in the flanking regions, and in some cases, also partially extending into the structural genes to create the genre of hybrid tRNA$ e r sequences at 12DE. The amount of patchwork homologies in both the 5'- and 3-flanking regions would probably be governed by the fidelity of the repair process as well as the spontaneous mutation rates subsequent to the repair. If these conversion-like events were dynamic as suggested by Cribbs (1982) it is not clear why other permutational forms have not been found. It is possible that they may exist at low frequencies. However, the preponderance of certain permutations may reflect their relative physical proximity of tRNA4 (7$ e r genes on the chromosome, or the nature of their surrounding sequences such as higher density of shared repetitive elements, which could also favor the type or direction of 228 conversion events among the tRNA^Ser genes. This last point is in fact emphasized by the observation that the 444-1 and the 474 genes do show evidence of cohesive evolution in both erecta and melanogaster If the subsequent repair of the heteroduplex is never extensive, and the initiation is preferentially near the ends of the structural gene, such that one end of the repaired tract frequently terminates somewhere prior to the anticodon, the processed patchwork would certainly go undetected because the final hybrid structure would still be a 474 gene. Obviously, if the other end of the repaired tract extends into the ^'-flanking regions, where heterology between the two genes can be conveniently used as "landmarks", the repaired tracts could then be easily discernable as small intraspecific homology patches above the background of heterology. Thus, the inability to detect other permutations may be simply due to constraints imposed by favored interactions, as well as short repaired tracts. The persistence of the same hybrid genes at homologous sites in the three different sibling species spanning the entire evolutionary history could also suggest that there may be selection at work. It is not known whether the hybrid genes are functional in rim although those from melanogasterran support in vitro transcription at reduced rates compared to the bona fide genes (St. Louis, 1985; St. Louis and Spiegelman, 1985). It must be emphasized that even though the hybrid genes may be transcribed in vivo, the integrity of at least the 474 gene is not essential for viability (see mapping studies in fig. 52). Therefore, functional selection cannot be an all encompassing factor in favoring (as well as eliminating) certain types of hybrid genes. The hybrid tRNA^ySer genes with their attendant flanking segmental homologies resemble partial, rather than complete, convertants. Such partial convertants, or transition stages (Strachan et al., 1985), in the evolution of multigene families appear to be extremely prevalent. In addition, at the DNA sequence level, blanket homogeneity in any multigene family has thus far never been observed (Strachan et. al., 1985; Baltimore, 1981 and relevant references on allelic differences cited on p. 221), suggesting that the spread and subsequent fixation of any variant through the entire family is a slow process (Dover, 1982). 229 5. tRNAArg Genes from Drosophila Sibling Species The tRNAArg genes corresponding to the pDt27R region have been cloned from D. erecta and D. yakuba. In both these sibling species, there is only one gene at this site. Their structural sequences are identical, but both differ from the melanogaster duplicated units by retaining a "T" at position 13, rather than "C". In comparing the 5-flanking regions between the sibling species, most conservation is seen in the immediate 28 bp. This corresponds to the 5' limit of the 200 bp duplicated cassettes in melanogaster. Beyond -28 for at least 50 nucleotides, there is much more sequence divergence; but in both the erecta and yakuba genes the residual homologies are closest to pArgl2.1 in melanogaster. In both the pDeArg-1 and pDyArg-1 genes, the flanking regions here contain many small deletions between 1 to 15 bp, relative to that in melanogaster. All three sibling species diverge from each other by a minimum of 15%, if all deletions were scored as single differences. Likewise, the most conserved nucleotide block in the 3'-flanking regions of the sibling species corresponds to the direct repeats, TAGCCCAA, at about 68 bp 3' to the structural gene. This small oligonucleotide is in turn embedded in highly homologous, but non-identical sequences that ranged from 25 bp long in yakubato 43 bp long in erecta Downstrean from this region, there is extremely strong homology between pDeArg-1 and pDyArg-1, Both of these sequences also show strong identity with the corresponding 3' region of pArg 12.4 of melanogaster. Thus, the most conserved 5'- and 3-flanking regions in the melanogaster tRNAArg cluster, relative to the sibling species, correspond to those of the two outer most genes, respectively. It was also noted that most of the conserved nucleotide blocks shared among all three sibling species in both the 5' and 3' flanking regions tend to be potential hairpin structures. Whether they are fortuitous or in fact play some biological roles such as recognition signals within the cell is unknown. The most rapidly diverging sequence in the tRNAArg genes from the different species occur in the 40-50 nucleotide long insterstitial regions between the 3-end of the structural gene and the direct repeat. While there are approximately 8 point mutations and one 15 bp long deletion in pDyArg-1 relative to pDeArg-1, both genes are completely divergent from the corresponding segment in melanogaster. 230 HEL T A6CCC A A 4 4 9 / 3 1 I TT h — T A 6 C C C A A SIM TAOACCAA ——I | 1 3 r---—TA6CCCCA ERE >77 T13 57 — CZZ l - ^ - T A O C C A A YAK >141 43 -TAGCCCA F i g . 55. The four tRNAAr* genes and f lanking sequences between the direct repeats from the Drosophila s ibl ing species are summarized. The structural gene is represented by the open box. MEL- melanogaster, SM-simutans ERE-erecta, Y ML-yakuba The 5 -flanking sequences (thick lines) for all four genes are highly conserved whereas the 3' sequence between the end of the structural genes and the direct repeats are much more diverged. The genes can be divided into two groups based on the mutation within the coding sequence and the homology imparted by their 3' tails. The 3' tails belonging to melanogaster and simutans are closer in sequence homology with one another (dotted lines) and those belonging to erecta and yakuba show some sequence similarities ("60%). Between the two groups (which coincidentally belong to the two species complexes in fig. 28), however, they are completely diverged. Across the phylogenetic tree, there appears to be a tendency for the 3' tails to expand in size from yakuba to melanogaster. Also, the two genes from simutans and melanogaster differ from those of erecta and yakuba by a T 1 3 - C 1 3 mutation within the structural sequence (small black squares). The two direct repeats of simutans are not perfectly conserved, while those from melanogaster are identical. Their distances in base pairs from the structural gene are indicated by the numbers above the genes. In melanogaster. the two different 5-flanking regions indicated are dependent on the size of the duplicated units (600 bp or 200 bp as either 449 or 31, respectively). The smaller duplicated units have a single large deletion in the 5-flank. The 5' direct repeats in erecta and yakuba have not been sequenced but are certainly more than 31 bp upstream from the gene, suggesting that all duplicated units in the present melanogaster are most likely to have been derived from a 600 bp antecedent. 231 Recently, the corresponding tRNAArg gene has also been isolated from D. simulans by C. Newton (unpublished). DNA sequence analysis showed that it is identical to that in melanogaster, having the less prevalent "C" nucleotide at position 13. Both its 5' and 3' flanking sequences are quite homologous to those in melanogaster Most of the differences are point mutations or single base pair insertions/deletions totalling approximately 7%. The most diverged region is also in the interstitial sequence between the end of the structural gene and the direct repeat. The differences noted here, besides several base changes scattered throughout also include a large block of deletion (~13 bp long) near the putative poly-T termination signal. A schematic diagram summarizing the configurations of the tRNAArg genes across the phylogenetic tree is provided in fig. 55 In Newton's extensive surveys by Southern blotting on approximately 45 different strains of D. melanogaster from Europe, North and South American, Asia and Africa, he showed that 40 of these contained the four duplicated genes, while the remaining five had three genes. In contrast, all sibling species surveyed contained only a single gene (Newton et. al., 1987). Thus, it is likely that the tRNAArg genes only duplicated recently, sometime after the divergence of the simulans And melanogaster species. From the above sequence data, it appears the the single mutation at position 13 within the structural gene occurred before divergence of the two species at least 4 x 10& years ago. With the identification of the direct repeat flanking the tRNAArg genes, it seems most likely that at least the first duplication was by unequal crossing over at or near these sites. The point of crossing over is not precise, however, since about 20 bp downstream to the TAGCCCAA repeat are included in pArg 12.2 and pArgl2.3, before the beginning of the next duplicated unit. The phenomenon observed with the tRNAArg genes would be analogous to the expansion of a cluster of three tRNA^lu genes reported by Hosbach et al., (1980); although in their case, no antecedents to the duplicated forms were investigated. To date, no hypothetical two gene-configuration for the tRNAArg has been identified (Newton, personal communications). The reason for this is unclear, but a trivial reason could be that the strains used in the surveys are not truly representative of the population at large. Alternatively, it could be possible that the two gene-configuration is 232 unstable or recombinogenic, quickly catalyzing further duplication events. This could be due to the increase in the length of sequence homology provided by the extra copy of tRNAArg gene in the hypothetical intermediate. Certainly the preponderance of the four gene- over the three gene-, and the absence of one gene-configuration among the different melanogaster slrzxns strongly suggests that once the duplication process started, there was little impediment to further duplications. Gene copies higher than four have not yet been detected. Since this appears to be an evolutionarily recent phenomenon, it may be that a sufficient time for this to occur has not elapsed. Conversely, there may be a selection against higher number of tRNAArg genes imposed by extraneous factors such as transcriptional components or post-transcriptional modification enzymes. This latter point could be tested by P-element transformation to artificially increase the copy number in the genome. From fig. 55. the 5' direct repeat is more than 31 bp upstream from the structural genes in both erecta and yakuba and since the homology extends far beyond this region, it would strongly suggest that the predecessor that gave rise to the duplicated genes in melanogaster was likely to be the 600 bp unit. In &DE16-82, the 3* half of another tRNAArg gene corresponding to pArgl2.6 in D. melanogaster was also sequenced. They are transcribed in opposite orientations, relative to their respective tRNASer genes, in the two species. Further, the large inter genie sequence separating the tRNAArg from the tRNA$ e r genes is not present in D. erecta It is not known if the intergenic DNA is actually deleted in the latter species, or whether the differences in the organization simply results from a sizable inversion. Attempts to resolve this issue were hampered by the fact that several probes prepared from D. melanogaster corresponding to this intergenic region were inundated with repeated sequences (see fig. 60), which were also present in erecta and all other sibling species tested (not shown). In summary, the studies on the tRNASer and tRNAArg genes at 12DE showed that this chromosomal site is undergoing rapid evolution. Both the creation of the hybrid tRNA47^ e r and the expanding tRNAArg g e n e c i u s t e r at this site could be due to misalignment of DNA. Such misalignment could be caused by the redundant nature of the genes themselves and 233 could be further stabilized by the presence of many repeated elements in the region. The existence of many repetitive and repetitious elements at 12DE had been predicted earlier from cytological analyses (Rudkin and Tartof, 1973), and in fact have been corroborated from the present chromosomal walk, including the cluster of Ste sequences. Moreover, from casual inspection of the sequence obtained for pDt27R (Newton, 1984), numerous small repeats ranging from 20- to 50-bp can be easily identified. Some of these are composed of complex sequences, while others are simple alternating TG and C A nucleotides or more monotonous poly T and poly A tracts. The mispairing of DNA at the white locus (M. L. Goldberg*/, al, 1983; Davis et al, 1987), recombination within the rDNA arrays in both yeast (Szostak and Wu, 1980) and Drosophila (Coen. and Dover, 1983; Gillings et al., 1987), in addition to the globin genes (Proudfoot and Maniatis, 1980) in humans suggest that DNA slippage at redundant loci could be a frequent contributing factor to chromosome aberrations and evolution of multigene families in general. The sequences obtained for the tRNA^Ser and tRNA^rg genes reported here from D. erecta, D. yakuba, D. melanogaster (Canton-S and Oregon-R) and D. simulans (Newton, unpublished; Cribbs et al., 1987b) showed that their 5'-flanking sequences are always more constrained than their 3'-flanks. As well, the degrees of divergence as imparted by 5'-flanking comparisons are in general agreement with the phylogenetic relationships established for the different sibling species based on chromosomal abberations (Ashburner et. al., 1984). While on average, simulans shows approximately 7% divergence from melanogaster, D. erecta and yakuba differ from each other and from melanogaster by an average of 15-30%. Using the estimate of 0.5-1.0% difference in nucleotide changes for every million years of divergence (Bodmer and Ashburner, 1984), this would suggest that simulans diverged from melanogaster at approximately 3 5-7 million years ago. D. erecta and yakuba would have diverged from each other and from melanogaster between 7 and 15 million years ago. All these estimates are within the general limits given based on previous studies (see references cited in Chapter II, Part II). According to the topological relationships established by Ashburner et. al. (1984) however, one would not expect the same degree of sequence 234 divergence from pair-vise comparisons between yakuba and melanogaster, and between the latter species and erecta. One tenable hypothesis could be that there may be a wide range of overlap in sequence divergence contingent on the choice of genes (or regions of DNA) being investigated. It could be equally likely that evolution of the Drosophila species is a complicated plexus and not the linear process as implied by the topological relationships. 234A APPENDIX CHAPTER IY 235 tRNA3h V a l Genes and Related Sequences Seven tRNA Val isoacceptors have been resolved by RPC-5 chromatography in D. melanogaster. tRNA3a^a^, tRNA3b^ and tRNA^al constitute the major isoacceptors whereas tRNA 1 Val, tRNA2 V a l , tRNA5Val and tRNA6Val constitute the other minor species (Dunn et. al., 1978). tRNA3b^ is the second most abundant valine isoacceptor and binds to the ribosomes in response to the valine codon GUG (Dunn et. al., 1978, Addison, 1982). Its nucleotide sequence has been determined (Addison, 1982; Addison et. al.. 1984). In situ hybridization to polytene chromosomes using purified tRNA has localized the coding sequences for these tRNAs to two major sites at 84D3-4 and 92Bi-9( and one minor site to 90BC on the right arm of chromosome 3 (Hayashi et al., 1980 and 1981). Grain distribution over these sites suggests a template ratio of 5;4:1, respectively. By measuring tRNA3b Va* levels in Drosophila mutants with altered gene dosage at the two major loci, both Dunn et al. (1979a) and Larsen et al. (1982) have independently demonstrated that genes at 84D3-4 and 92B1-9 are actively transcribed in vivo. Moreover, each of the sites contributes to approximately 50% of the total cellular tRNA3bVal pool, in direct proportion to gene dosage. In contrast, no substantial level of transcription was detectable from the tRNA3b V a l genes located at 90BC. The results suggest that most, if not all, of the active genes reside at the two major loci. The lack of detectable transcription from 90BC would suggest that there may only be a small number of templates, in accordance with the reduced level of in situ hybridization at this site. Alternatively, it could also be possible that sequences at 90BC are gene vestiges that are transcriptionally silent and this hypothesis could explain the reduction of affinity to the tRNA probe. A homoestatic mechanism for compensating for the loss of a large portion of the tRNA3b v al templates appears to exist, since heterozygous deficiency for 84D caused no changes in total valine acceptance (Dunn et al„ 1979a). It is not known how this adjustment is achieved but compensation by selective gene amplification of the remaining tRNA3b Val templates has been ruled out (Larsen et. al., 1982). 236 The wealth of available genetic data , unprecedented for any other Drosophila iKilk gene family, has in part prompted a further investigation on the expression of tRNA3b^a* genes at the molecular level, and in particular whether the sequences at 90BC are tRNA3b^al gene vestiges. The results presented here have been a collaborative effort with Drs. W. R. Addison, A. D. Delaney and R. M. MacKay, and have been published in Leung et al. (1984). RESULTS Sequence Analysis of pDt41R A plasmid derived from 90BC, designated as pDt41R, was isolated from a pBR322 library containing Hin_dIII cut D. melanogasterWk (Dunn et */,l979b). The recombinant plasmid contains a 2.0 kb DNA fragment that hybridizes to tRNA3bVal (data not presented). An infrequently encountered restriction site, Xmal. was identified in the T? C loop of the tRNA 3 b V a l and was used as a marker to locate the gene within the Drosophila DUk. Two genes have been identified, corresponding to the Xmal sites in the insert: one for a gene similar, but not identical, to the expected tRNA3b v a* and the other for a tRNAuGG^ r o The sequenced portion of the plasmid is shown in fig. 56. A transcript of this tRNA3bv al-like gene would differ from the tRNA3b v a* sequence at four sites; nucleotides C5, Ci6 ( G&8 and G g^ would be replaced by U5, Ui$, A$$ and A69. To clarify the positions of the nucleotide replacements and the Xmal site, the cloverleaf structure of this putative tRNA is displayed in fig. 57. However, an in pyyptRNA species containing these nucleotide changes has not yet been reported. The proline tRNA gene in pDt4lR is 276 bp downstream from the tRNA3b^a* gene and is of opposite polarity. Neither of the two major tRNA^ r o species of Drosophila has been sequenced (White et al., 1973), so a comparison to the gene sequence is not yet possible. However, the non-transcribed strand of the tRNATjGGP r o gene in pDt41R is 95% homologous to the corresponding tRNA of mouse and chicken (Sprinzl et al. 1987). Homologies With Another tRNA3frVaLContaining Plasmid A second plasmid, designated as pDt48, containing a 2.4 kb Drosophila insert, also maps to the 90BC locus by in situ hybridization (Dunn et al., 1979b). It also contains an identical 237 Fig . 56. The sequenced segments of pDt41R, pDt48 and the corresponding region from Canton-S from region 1 of chromosomal site 90BC (see fig. 3) are shown. The genes encoding the tRNA3bVa,-Hke and the tRNAucr/™ axe boxed. The small arrow heads Indicate the 3 -ends of the two tRNA genes and the direction of transcription. The top line is the sequenced region from pDt48. Aligned underneath are the sequenced portions of pDt41R and the corresponding segment derived from the Canton-S strain (Can-S) as reported by DeLotto and Schedl (1984). Dashed lines indicate identical sequences. Where differences do occur, they are indicated as (*) for deletions; substitutions are indicated by the base replacements. LOO pDt48 AAAATAAATC TAAGTATGCA ACTTTGGCAA GATCAGAAGA ATAAGTTAAA CGGCCATTG* AAAATGTGTT TCTCCAATTC TTCTAAAAAA AATGTAATAA p D t 4 l R _; __Q _. , vi ; ** Can-S ' !_ . A _ ; A ; _: ** 200 pDt48 AATTTTAAAA TAAGCAAATA GTTCCACAGG AAACTAGAGT CATGCAGGTT AGTCCTTTTT GTGTTGTGTG •AACACAATAC TCTATACTGT TAGTTTAAAC P Q c 4 1 R - — - . • — - - A - - A - : — — G T Can-S " " " " " — — 2 — — — C - - A - — G T , - • • • • . V >• 300 pDt48 TAAATGTTTC ATGATJGTTTT CGTAGTGTAG TGGTTATCAC GTGTCCTTCA CACGCACAAG GTCCCCGGTT CGAACCCGGG CGAAAAC/fc* ATTGA*TTTT p D t A l R A _ C : — ~ A * — C — * Can-S ______A___ c : •- ~ — - ~ : i AA — C - - C 400 pDt48 TTTTTAATTT CTTTTTACAT TTTCGATGAA TCTTAGGGTT GAAAACGGTA ACACAAATAA AATATTTTAA TACCCTTAAG GAATAATTGA A A A A A C A C * * pDt41R A _____ T T - C * * * * * * * * * * * * * * - : G — — T — — T A - T '• T -AAA Can-S _ _ _ * * _ _ * _ _ . . T T - C * * * * * * * * * * * * * * G — T T A - T T -AAA pDt48 CGATGCTATA GAAACGTACC AATATTTGAA TAAGCCAATG GGGTTTAAAT CCATACATAT TGTTTACGGG TCAAACCATT TACTTTCTAT ACTTTAAATA p D t ^ l R _. __; T G ' T-r : r : - - T -Can-S ___: - _ _ _ _ _ _ _ T - - A __-_-G — - -T- T- — . r- - T ^ - — • •+ P 600 pDt48 TTTTCTTTAA TTTTCAGAAA AATTAGCAAA GAAAAAATTT GTACGTGCGG TTGA**GT*T GAGCAATAAA AACAGTACAG CTfcGGCTCAA CCCGGATTTT; p D t M R _c -T — - A A C C ~ * T ~ A C * T - A C Can-S ___ C — T A A - — - C C — G C — A C — - * T - A C — . ' • - ; ' •• ; • .: •. - 7oo pDt48 AACCCGGGAC CTCTCGCACC CAAAGCGACA ATCATACCCC TAGACCyVTTG AGCCTCATAA CGAGTATGCA GCTGTTCGCC AGTTTCCAAC TAGGCAGAGC pDt4 1R ______ _ _ __ _ ___ _ ______c _ ____„ _ — A — Can-S . ' _ " — _ — ' ________ . __ _ ..__ . — ; : G — T — — A .. V,' -. '• 800 pDt48 AAGGCATTTT CTTGAGTGAG CCTCTAAAGA CAAACAAAAA TCATTTCTTC GATGTAACAA' TCTAAAATAT ATATTTAGTA AACATAGAAC AATATCTCTC pDt41R _ _ _ A _ G _____ _ _ 0___T ,_ __ _ Can-S _ ;______G_A - — — , _____G T — — ~ / • * pDt'<8 AGCGTCGCTA AGTC ' . p D t M R OO Can-S * — - . ' ' • . • , / ' * , ' . ' ; 239 AQH 7 6 C PG -& U - A U - A 70 U G A U 5 C -G A C " G 65 6 0 A U l A ^"CGGGCCC^A C V *? A L L L L 1 G G GUG ^ST^.. C G CAC D D U 25 r m G acp3U A 0 1 AG,, 20 V " A G-C U " A 30 G " C 40 C - G U C U A C A C 35 CCCGG 55 Fig . 57. The cloverleaf structure of Drosophila tRNA3b V a l (Addison, 1982; Addison et. al., 1984) with sites of the four differences indicated for a hypothetical product from the variant gene. The XrnaJ site, (CCCGGG), exploited io localization of the genes corresponding to bases 60-65-240 £RNA-3b^ a* like and a tRNATJGG^1*0 S e n e corresponding to two Xmal sites in the Drosophila insert. The portion of the plasmid containing the two genes has been sequenced by Dr. A. D. Delaney and it is also aligned in fig. 56. The structural sequences of the two tRNA genes are identical to their counterparts in pDt4lR. Outside the genes, the two plasmids also show 92% in homology. Most of the differences between the sequenced portion of pDt4lR and pDt48 are the result of single base pair changes, or insertions or deletions of a few base pairs. The only major difference is a 14-bp sequence characterized by restriction sites for Ddel and Hinfl that is present in pDt48 between nucleotides 325 and 338, which is not found in pDt41R. In an initial attempt to determine the extent of homology between the two Drosophila inserts without extensive sequencing, the Xmal* Hindi 11 fragments from both plasmids were purified from a 5% polyacrylamide gel and restriction mapped by the Smith-Birnstiel method (Smith and Birnstiel, 1976; see Methods and Materials). Each fragment was mapped twice, using first the radiolabelled Hindlll site as the reference point, then this was repeated with the same fragment labelled at the Xmal site. Furthermore, the predicted restiction sites were confirmed by complete digest of the entire plasmids or purified inserts. When the maps of both plasmids were aligned with their Xmal sites coincident, the distribution of the other restriction sites suggest that there is distinct but incomplete homology between the Drosophila inserts (fig. 58). To examine homologies further with more precision, the two plasmids were compared by the Southern cross-annealing analysis of Dr. C. A. Hutchison III (for an example, see Sato et. al., 1977). Fragments produced by digestion of 6 ug of pDt4lR with Hinfl + Hindlll were separated by electrophoresis in a 5% polyacrylamide gel (17 cm x 17 cm) with a 15 cm wide slot and transferred to a sheet of cellulose nitrate by Southern blotting (Southern, 1975) with modifications according to Gergen et al, (1981). After transfer, the filter was dried in air and the DNA immobilized by baking at 80 °C in vacuo for 4 hours. The DNA fragments in a Hinfl * Hindlll digest of 4 ug of pDt48 were first labelled at their 3' -ends with (a.32p]-dATP and the Klenow enzyme prior to electrophoresis in a similar gel. Bands of the radioactive fragments were transferred to the same piece of filter used in the first transfer, except that after DNA denaturation, all solutions for neutralization and transfer 241 pDt48 Hind III Ode I Hinf l I II Hoe 111 I Toq I I III I I I pD(41R Dde I Hinfl Hoe III Toq I Hind III I Vo l 3b X m a I I Pro X m a I Hind I X m a I Xma I ia I c L L Hindi I I 500 bp F i g . 58. Restriction maps of pDt48 (top) and pDt41R (bottom) as constructed by the Smith and Birnst ie l method. The Xmal sites or the corresponding genes are aligned. The orientation of the two tRNA genes within the cloned insert are shown as arrows above the pDt48 restriction maps. Other restriction sites for Ddel. Hinfl. HaelH and Taql are shown as vertical tickmarks below the maps. 242 contained 0.5% SDS (w/v); the filter was rotated 90° to allow DNA bands from the two gels to intersect and transfer was performed at 65 °C overnight to allow annealing between pDt41R and pDt4S. After transfer, the filter was rinsed several times with untitrated 3mM Tris base and then subjected to autoradiography without drying- Homology maps of the two plasmids were constructed based on the pattern of the hybridization spots (fig. 59,C). The results indicate that most of the differences in the distribution of restriction sites can be attributed to changes involving one. or a few, base pairs (fig. 59, A and B). However, two Hinfl fragments of SO- and 130-bp in length derived from pDt4S failed to produced hybridization spots with pDt41R even after prolonged exposure of the autoradiograph (data not shown). The SO-bp lies outside the 2.0-kb homologous region, while the 130 bp lies within. The failure of the latter to hybridize suggests the presence of a sizable interruption in homology, which was previously undetected by restriction analysis. Homology resumes to the left of this interruption as shown by the intense hybridization between the 450-bp fragment of pDt48 and the 250-bp fragment of pDt41R. The smallest Hinfl fragments of pDt48 have been omitted from the diagram. As they occur outside the 2.0-kb domain, they are unlikely to show any homology. An additional major difference is that pDt41R is associated with highly repetitive sequences as revealed by genomic Southern blotting, while pDt48 is essentially unique in the genome (data not shown). Since their restriction and homology maps do not differ extensively, it is likely that the repetitive element associated with pDt41R is small. It seems likely that the plasmids pDt4lR and pDt48 represent different alleles at the 90BC locus. Previous Southern hybridization of HindHI-digested genomic DNA with [125l]tRNA3bVal as a probe had revealed a hybridization bands of 2.0- fragments (as well as six other fragments) but not a 2.4-kb band (Tener et. al., 1980). When 32P-labelled pDt48 as the probe and genomic DNA from several sources (different lines of Oregon-R lab stocks, Samarkand and tissue culture) were surveyed, the most prevalent hybridization pattern is representated by the non-isogenic Oregon-R where fragments of 2.0- and 4.6-kb occur at approximately the same frequency within the population. The Samarkand strain also contains both the 2.0- and 4.6-kb fragments but with the latter occuring as the major allele in 243 F i g . 59. Homologies among all possible Hinfl fragments in pDt4S and pDt41R as deduced by Southern-cross hybridization. (A) On the autoradiogram, the homologous sequences between fragments derived from the two plasmids appear as hybridization spots at their sites of intersection. (B) Schematic representation of panel A. The vertical dashed lines represent the positions of the unlabelled Hinfl-generated pDt41R fragments on the cellulose nitrate filter. The horizontal bands represent labelled Hinfl-generated fragments of pDt48 DNA transferred onto the same filter at 90° relative to the first band patterns. Homologies between the two pBR322 vectors are shown as open spots (which conveniently act as internal controls), and between the Drosophila DNA as solid spots. Hatched spots represent non-specific retention of labelled pDt48 DNA on the filter. The hybridization spot between the 190-bp fragment of pDt48 and the 67-bp fragment of pDt41R is clearly visible in the original autoradiograph but too faint in this reproduction. (C) Homology maps based on the interpretation of panel (B) illustrating the interruption in homologies by a 130-bp Hinfl fragment in pDt48. The 50-, 80-, and 40-bp non-hybridizing fragments are the first, second and third Hinfl fragments, respectively, from the left Hindlll site in Fig.58 (top). A l l of these fragments are located outside of the 2.0-kb homologous region. o I 3 § IS 245 the population (data not presented). In addition, there is a possible trace signal at 2.4-kb. In contrast, only the 2.0-kb fragment remains detectable in the isogenic Oregon-R strain and tissue culture cells (data not shown). The clear correspondence between the tRNA and pDt48 hybridization patterns indicates that the 2.0-kb insert found in pDt41R is common in the population. pDt48 may represent an extremely rare allele as suggested by the low hybridization signal in the Samarkand strain, or alternatively, it may have arisen as the result of a deletion of the 4.6-kb fragment during cloning. DISCUSSION The tRNA3bV a l-like gene in pDt4lR contains the expected triplet sequenced, CAC, at the anticodon but differ from the expect sequence at nucletides 5,16,68 and 69. Three of the four nucleotide differences (5,68 and 69) occur within the putative tRNAs acceptor stem, but these should not disrupt the helical structure of the tRNA since concomitant changes at both nucleotides 5 and 68 create a new base pair; in addition, the G-A transition nucleotide at 69 would allow a perfectly base-paired acceptor stem in the hypothetical transcript. A recent molecular walk at 90BC has isolated ten tRNA genes dispersed randomly in the 31 x 103 base pairs from this site (DeLotto and Schedl, 1984). Two of the tRNA genes located in a 1.9-kb Hindlll fragment corresponding to region 1 (fig. 3) are identical to the tRNA3bVaMike and the tRNA^ r o genes presented in this report. The sequences from this portion of their molecular map is presented in fig. 56. The gene flanking sequences reported by DeLotto and Schedl (1984) showed close similarities to those in pDt41R (1.1% divergence) but again differs from those in pDt48 by approximately 9%. It is not known whether this is the only tRNA3b^al-like gene at 90BC since there are at least four more tRNA genes which remain as yet un characterized. A single tRNA3b^ gene has also been isolated from each of the major hybridization sites at 84D3-4 (designated as pDt78R and sequenced by Addison, 1982) and 92Bl-9 (sequenced by Silberklang, Hosbach and McCarthy; see Leung et. al„ 1984). Genes from both sites are identical and correspond to the expected sequence to encode tRNA3b^a*. Outside the structural 246 gene, however, there is no homology between the two Drosophila inserts or with the flanking sequences in pDt4lR. Subclones of the three plasmids, pDt41R, pDt48 and pDt78, containing only the tRNA3DVaJ genes and at least 20