@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Medicine, Faculty of"@en, "Medical Genetics, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Goodchild, Nancy L."@en ; dcterms:issued "2009-04-08T17:58:03Z"@en, "1993"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """Transposable elements comprise ~10% of the human genome and are hypothesized to play a significant role in genome variability and gene regulation. My work has focused on the RTVL-H family of human endogenous retroviruses and the possible impact these elements may have had and continue to have on the human genome. The evolutionary history of the RTVL-H family within the primate lineage was examined using several approaches. My findings suggest that the RTVL-H family has undergone two successive waves of amplification during primate evolution. The major amplification of RTVL-H elements appears to have occurred very early within the Old World primate lineage, after its divergence from the New World monkeys. The genomes of humans, apes and Old World monkeys are estimated to contain 50-100 copies of undeleted elements and 800-1000 copies of deleted sequences. This is in contrast to the New World monkey, marmoset, genome which contains only a handful of intact elements and ~50 deleted sequences. A second expansion of deleted elements appears to have occurred within a common ancestor of the ape lineage and was associated with a novel long terminal repeat (LTR) subtype. The structure of RTVL-H elements and their distribution in the genome suggest that RTVL-H elements amplified via retrotransposition. I have found evidence for this through the identification of an RTVL-H element in the orangutan genome that appears to represent the reverse transcription and integration of a spliced RTVL-H transcript. An apparent spliced genomic element found in human DNA lacks an intact 5’ LTR and may represent a processed RTVL-H pseudogene. This element appears to be polymorphic in the human population. I have also laid the ground work for a strategy to demonstrate RTVL-H retrotransposition within an experimental time frame. This strategy uses a “retro- transposition indicator gene” and allows for the direct selection of a retrotransposition event. RTVL-H LTRs contain transcriptional regulatory sequences and thus may affect the expression of adjacent cellular sequences. I have identified a clone, termed cPj-LTR, containing an ORF of 223 aa that has been polyadenylated within an RTVL-H LTR. The corresponding gene, termed PLT, is a novel, multi-exon locus that appears to have been evolutionarily conserved. Northern analysis identified several PLT-related transcripts in placental RNA samples, one of which appears to be associated with the LTR. The presence of this PLT-LTR fusion transcript was confirmed by PCR. Analysis of additional PLT cDNA clones suggests that the PLT mRNA undergoes alternative splicing at its 3’ end, with polyadenylation within an RTVL-H LTR occurring in one of the resulting transcripts."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/6917?expand=metadata"@en ; dcterms:extent "5099351 bytes"@en ; dc:format "application/pdf"@en ; skos:note "THE IMPACT OFENT)OGENOUS RETROVIRUS-LIKE SEQUENCESON THE HUMAN GENOMEbyNancy L. GoodchildB.Sc. (Biochemistry), B.Sc. (Biology), The University of Victoria, 1989A thesis submitted in partial fulfillment ofthe requirements for the degree of Doctor of PhilosophyinThe Faculty of Graduate StudiesGenetics ProgramWe accept this thesis as conformingto the required standardF.-4’The University of British ColumbiaDecember 1993© Nancy L. Goodchild, 1993in presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of f’i&DicAL 6Eflc5&p,;1iE PGC-tZ1t-7)The University of Brftish ColumbiaVancouver, CanadaDate MRc-1 (‘L1DE-6 (2/88>ABSTRACTVTransposable elements comprise 10% of the human genome and are hypothesizedto play a significant role in genome variability and gene regulation. My work has focused onthe RTVL-H family of human endogenous retroviruses and the possible impact theseelements may have had and continue to have on the human genome.The evolutionary history of the RTVL-H family within the primate lineage wasexamined using several approaches. My findings suggest that the RTVL-H family hasundergone two successive waves of amplification during primate evolution. The majoramplification of RTVL-H elements appears to have occurred very early within the OldWorld primate lineage, after its divergence from the New World monkeys. The genomes ofhumans, apes and Old World monkeys are estimated to contain 50-100 copies of undeletedelements and 800-1000 copies of deleted sequences. This is in contrast to the New Worldmonkey, marmoset, genome which contains only a handful of intact elements and —50deleted sequences. A second expansion of deleted elements appears to have occurred withina common ancestor of the ape lineage and was associated with a novel long terminal repeat(LTR) subtype. VThe structure of RTVL-H elements and their distribution in the genome suggestthat RTVL-H elements amplified via retrotransposition. I have found evidence for thisthrough the identification of an RTVL-H element in the orangutan genome that appears torepresent the reverse transcription and integration of a spliced RTVL-H transcript. Anapparent spliced genomic element found in human DNA lacks an intact 5’ LTR and mayrepresent a processed RTVL-H pseudogene. This element appears to be polymorphic in thehuman population. I have also laid the ground work for a strategy to demonstrate RTVL-Hretrotransposition within an experimental time frame. This strategy uses a “retrotransposition indicator gene” and allows for the direct selection of a retrotranspositionevent.RTVL-H LTRs contain transcriptional regulatory sequences and thus may affect theexpression of adjacent cellular sequences. I have identified a clone, termed cPj-LTR,11containing an ORF of 223 aa that has been polyadenylated within an RTVL-H LTR. Thecorresponding gene, termed FLT, is a novel, multi-exon locus that appears to have beenevolutionarily conserved. Northern analysis identified several PLT-related transcripts inplacental RNA samples, one of which appears to be associated with the LTR. The presenceof this PLT-LTR fusion transcript was confmed by PCR. Analysis of additional PLT eDNAclones suggests that the FLT mRNA undergoes alternative splicing at its 3? end, withpolyadenylation within an RTVL-H LTR occurring in one of the resulting transcripts.mTABLE OF CONTENTSABSTRACT iiTABLE OF CONTENTS ivLIST OF TABLES viiLIST OF FIGURES viiiLIST OF ABBREVIATIONS ixACKNOWLEDGMENTS xCHAPTER I INTRODUCTION 1A. TRANSPOSABLE ELEMENTS 21. Classification 22. Distribution 4B. AN OVERVIEW OF HUMAN RETROELEMENTS 51. Non LTR-containing retroelements 62.1. SINEs 62.2. LINEs 82. LTR-containing retroelements 92.1. Retrotransposons 92.2.TBE-1 112.3. Human endogenous retroviruses 122.3.1. Class I HERVs 14(a)HERV-E 14(b)ERV1 16(c)ERV3 16(d) RRHERV-I 18(e)S71 18(f)RTVL-I 20(g) ERV9 20(h)HRES-i 21(i)HERV-P 222.3.2. Class II HERVs (HERV-K) 232.4. The RTVL-H family 26C. EVOLUTIONARY HISTORY OF RETROELEMENTS 301. SINEs 301.1. The origin of SINEs 301.2. The primate Mu family 322. LINEs- The Li family 343. LTR-containing retroelements 353.i.HERVs 383.2. THE-i elements 394. Evolution of primates 40D. RETROTRANSPOSITION 42i. Mechanisms of retrotransposition 421.1. Retrotransposons 421.1.1. Transcription 441.1.2. Reverse transcription 461.1.3. Integration 491.1.4. Cis-acting sequences 531.1.5. Retrotransposition is mutagenic 561.1.6. Rates of retrotransposition 581.2. LINEs - Li elements 581.2.1. Transcription 591.2.2. Reverse transcription/integration 601.3 SINEs - Mu elements 62iv1.3.1. Transcription.621.3.2. Reverse transcription 631.3.3. Integration 642. Regulation of retrotransposition by the host cell 652.1. Transcriptional regulation 652.2. DNA methylation 662.3. Post-transcriptional regulation 673. Activation of retrotransposition by exogenous factors 683.1. Hormones 683.2. Environmental stresses 693.3 Cell culture 70E. THE IMPACT OF RETROELEMENTS ON THE HOST 701. Agents of genetic disease 711.1. De nova insertions 711.2. Element-mediated rearrangements 721.3. HERVs and autoimmune disease 732. Functional assimilation 743. Mediators of genetic variation 78F. THESIS OBJECTIVES 80CHAPTER II MATERIALS AND METHODS 82Cell lines 83Transfections 83CAT assays 84Generation of retrotransposition (RT) cell lines 84Selection systems for retrotransposition events 85Construction of the neo-int cassette and the RT vectors 85Probes 90Library screening 92DNA isolation and Southern analysis 92RNA isolation, cDNA synthesis and Northern analysis 93DNA sequencing and computer analysis 94PCR analysis 94CHAPTER III EXAMINATION OF THE EVOLUTIONARY HISTORY OF THE RTVL-HFAMILY OF HERVs 97INTRODUCTION 98RESULTS 99Amplification of deleted elements in the Old World primate lineage 99Recent expansion of a subfamily of deleted elements 106Structures of RTVL-H LTR subtypes 106Copy numbers of the LTR subtypes in various primate species 110Determination of integration times of individual RTVL-H elements 114RTVL-H associated polymorphisms 122DISCUSSION 124CHAPTER IV RTVL-H RETROTRANSPOSITION 130INTRODUCTION 131RESULTS 132Isolation of previously retrotransposed elements 132Detection of transposition through target gene disruption 143VDetection of transposition through the use of a retrotransposition (RT)indicator gene” 145Testing the neo-int cassette 147Choosing the donor RT\\TLH element 148Introduction of the RTVL-H/neo-int vector into a human cell line 151Introduction of the RTVL-H/neo-int vector into a mouse packaging cell line ... 152DISCUSSION 156CHAPTER V AN RTVL-H LTR PROVIDES A POLYADENYLATION SIGNAL TO A NOVELALTERNATIVELY SPLICED TRANSCRIPT IN NORMAL PLACENTA 164INTRODUCTION 165RESULTS AND DISCUSSION 166Identification of non-RTVL-H cellular transcripts polyadenylated withinRTVL-H LTRs 166Sequence analysis of cPj-LTR 169Genomic organization of the PLT locus 171Expression of PLT-related transcripts 173Detection of PLT-LTR RNA and genomic DNA by PCRs 175Isolation and analysis of additional PLT-related cDNA clones 178CHAPTER VI SUMMARY AND PERSPECTIVES 182REFERENCES 189VLIST OF TABLESTable 1-1 Summary of known HERV families 13Table 2 1 Sequences of primers used in the various PCR strategies 95Table 3-1 Copy numbers of RTVL-H LTR subtypes in different species 113Table 3-2 RTVL-H elements analyzed for time of integration 116Table 3-3 Summary of PCR analysis of 11 RTVL-H loci in primates 119vilLIST OF FIGURESFigure 1-1 Classification of transposable elements 3Figure 1-2 Structural features of retroelements 7Figure 1-3 Characterization of RTVL-H poi region deletions 27Figure 1-4 A scheme for the origin of retroelements 37Figure 1-5 Primate phylogeny 41Figure 1-6 Comparison of the life cycles of retrotransposons and retroviruses 43Figure 1-7 Transcriptional regulatory elements in the retroviral LTR 45Figure 1-8 A model for retroviral DNA synthesis by reverse transcription genome 47Figure 1-9 A model for the integration of retroviral DNA into the host cell genome 50Figure 1-10 Cis-acting. elements in genomic viral RNA 54Figure 2-1 Construction of the neo-int cassette and RTVL-H/neo-int retrotranspositionvector 87Figure 2-2 Recombinant PCR strategy used to mutate the 5’ most base of the cH-4 3’ LTR... 89Figure 3-1 PCR primers and probes spanning the A-B-C pol region 100Figure 3-2 Southern analysis of primate DNAs using the poi A region probe 101Figure 3-3 PCR among primate species using primers that span the poi A-B-C region 103Figure 3-4 Southern analysis of a panel of primate DNAs using an env-specific probe 107Figure 3-5 RTVL-H LTR subtypes 108Figure 3-6 Southern analysis of primate DNAs using the LTR subtype-specific probes 111Figure3-7 PCR strategy for examining individual RTVL-H loci across primate species 115Figure 3-8 PCR analysis of the RTVL-H3 integration site in primates 117Figure 3-9 Diagram ifiustrating the integration times of the individual RT\\TL-H elementspresented in Table 3 120Figure 3-10 Southern analysis of human DNAs probed with two RTVL-H subset probes 123Figure 4-1 Detection of spliced genomic RTVL-H elements 133Figure 4-2 Sequence alignments of 5’ and 3’ splice junctions of the PCR clones generatedby the strategy of Fig. 4-1 136Figure 4-3 PCR strategy to distinguish between retrotransposed, spliced RTVL-H elementsand processed RTVL-H pseudogenes 138Figure 4-4 Analysis of the putative spliced RTVL-H element identified in certain humanDNAs 140Figure 4-5 Analysis of the putative spliced RTVL-H element identified in orangutan DNA 141Figure 4-6 Representative Southern analysis of HPRT- mutants 144Figure 4-7 Strategy to detect new retrotranspositions through the use of an ‘RT indicatorgene’ 146Figure 4-8 Southern analysis of GP+E86 RT cell lines 154Figure 4-9 Northern analysis of GP+E86 RT cell lines 155Figure 5-1 Strategy for isolating non-RTVL-H cellular sequences being polyadenylated withinRTVL-HLTRs 167Figure 5-2 Schematic representation of two unrelated cDNA clones correctly polyadenylatedwithin an RTVL-H LTR 168Figure 5-3 Sequence analysis of cPj-LTR 170Figure 5-4 Southern analysis of human and other primate DNAs with a probe specific for thenon-RTVL-H portion of cPj-LTR 172Figure 5-5 Northern analysis of PLT-related sequences 174Figure 5-6 PCR analysis of placental RNA with primers spanning the cellular-LTR junction ofcPj-LTR 176Figure 5-7 PCR analysis of the cPj-LTR RTVL-H integration site in primates 177Figure 5-8 Diagram of the 3’ portions of PLT-related cDNA clones 179‘iiLIST OF ABBREVIATIONSaa, ammo acidATCC, American Type Culture CollectionBaEV, baboon endogenous virusbp, base pair(s)BSA, bovine serum albuminCAT, chioramphenicol acetyltransferaseEC, embryonal carcinomaEM, electron microscopyERV, endogenous retrovirusEtdBr, ethidium bromideFCS, fetal calf serumHAT, hypoxanthine/aminopterin/thymidineHERV, human endogenous retrovirusHPRT, hypoxanthine guanine phosphoribosyl transferaseTAP, intracisternal A particlekb, kilobase(s)LINE, long interspersed nuclear elementLTR, long terminal repeatMaLR, mammalian apparent LTR-retrotransposonsMLV, murine leukemia virusMMTV, mouse mammary tumor virusMYr, million yearsnt, nucleotide(s)ORF, open reading framePBS, primer binding sitePCR, polymerase chain reactionPLT, placental LTR terminated genePPT, polypurine tractPSI3G, pregnancy-specific 131 glycoprotein genesRE, restriction enzymeRT, retrotranspositionRTVL-H, retrovirus-like element with a histidine tRNA PBSRVTase, reverse transcriptaseSA, splice acceptorSD, splice donorSDS, sodium dodecyl sulphateSINE, short interspersed nuclear elementTE, transposable elementTHE-i, transposon-like human element - 16-TG, 6-thioguanineUAS, upstream activating sequenceUTS, untranslated sequenceVLP, virus-like particleixACKNOWLEDGMENTSI would like to take this opportunity to thank Dixie Mager for her support andencouragement over the last five years. I would especially like to thank Dixie for herpatience and understanding during my ‘moment of crisis’. She made me see that I wasgetting somewhere with my research.I would like to thank Doug Freeman for his indispensable help with my various projects,whether it entailed constructing a retrotransposition indicator gene or simply pounding thebubbles out of my sequencing gel.I would like to thank the members of the Mager lab, past and present - Anita, Dave, Jack,and especially Paul and David - for making the days (and nights!) in the lab moreenjoyable. Thank you also to Dean and Carmine.I am grateful to the members of my supervisory committee - Connie Eaves, Fumio Takeiand Michael Hayden - for helpful discussions and for the critical reading of my thesis. Iwould also like to acknowledge the research support of the Terry Fox Laboratory and thefinancial support of the Medical Research Council of Canada.Last but definitely not least, I would like to give many heartfelt thanks to my family fortheir continuing support and to Todd for just being there when I needed him the most.xCHAPTER IINTRODUCTIONA. TRANSPOSABLE ELEMENTSTransposable elements are DNA sequences which can insert into different genomicsites. First described in maize by Barbara McClintock more than 40 years ago (McClintock,1952), transposable elements have since been found in all genomes in which they have beensought and are thus believed to be ubiquitous.1. CLASSIFICATIONA classification scheme for transposable elements is shown in Figure 1-1.Transposable elements can be divided into two broad groups according to their mechanismsof transposition (Finnegan, 1989a). The elements of one group transpose via the reversetranscription of an RNA intermediate and hence are given the general name of“retroelements. Retroelements can be further subdivided based upon the presence orabsence of long terminal repeats (LTRs) that bound the element. The LTR-containingretroelements include retroviruses and retrotransposons, sequences that are retrovirus-likein overall structural organization but lack an env domain and are not infectious. Examplesof retrotransposons include the Ty elements of Saceharomyces cerevisiae, the copia-likeelements of Drosophila melanogaster, and the intracisternal A particle (TAP) elements ofthe mouse.The non-LTR-containing retroelements include elements belonging to one of severalstructurally distinct groups. ‘Retroposons are elements that mediate their owntransposition either by encoding a reverse transcriptase activity or by possessing sequencecharacteristics that allow their RNAs to be recognized in trans by other retrotranspositionmechanisms. Retroposons include LINEs (long interspersed nuclear elements), SINEs(short interspersed nuclear lements), mitochondrial introns and mitochondrial plasmids(Hull and Will, 1989). Other non-LTR-containing retroelements include ‘retrons, a namecoined for retroelements that use unconventional mechanisms of reverse transcription suchas the multicopy single-stranded DNA, or msDNA, of mycobacteria and Escherichia coli2TRANSPOSABLE ELEMENTSDNAintermediateThmsposonsRTJSINEs-LTRIRetroposon÷ RTLINEs-RNAintermediateRetroelementsinfectiousFigure 1-1. Classification of transposable elements. See text for details. RT, reversetranscriptase.+ LTRnon-infectiousRetrotransposon RetrOVirUS3(Hull and Will, 1989; Garfinkel, 1992); and “retrosequences’, eDNA genes and pseudogenesformed by reverse transcription and reintegration of mRNAs but because they do not carryreverse transcriptase genes or any sequence for efficient reverse transcription,retrotransposition is passive (Hull and Will, 1989; Weiner et al., 1986).For all retroelements, transposition by reverse transcription would be a replicativeevent as the donor element would not have to excise from the genome but simply beexpressed. As a result, the number of retroelements in the genome should increase aftereach transposition event. This has been demonstrated for I elements in D. melanogaster(Bucheton, 1990) and Ty elements in S. cerevisiae (Curico and Garfinkel, 1991) when theyare induced to transpose at high frequency by genetic manipulation. Another hallmark ofall active retrotransposition events is the creation of a target site duplication at the site ofintegration.The second group of transposable elements are believed to transpose directly fromDNA to DNA (Finnegan, 1989a) and are generally referred to as “transposons”.Transposons can be subdivided based on differences in structural organization. Elementssuch as P elements in D. melanogaster, Ac/Ds elements in Zea mays, and Tcl elements inCaenorhabditis elegans have short terminal inverted repeats and some are known to encodea function (ie. transposase) required for their own transposition (Finnegan, 1989a). Theseelements can all excise, which may (AciDs elements) or may not (P elements) be part of thetransposition mechanism (Finnegan, 1989a). Other transposons such as the foldback or FBelements of D. melanogaster and the TU elements of the sea urchin Strongylocentrotuspurpuratusis have long terminal inverted repeats that are internally repetitious and do notappear to encode a transposase function (Finnegan, 1989a).2. DISTRIBUTIONThough transposable elements are thought to be ubiquitous, different transposableelement groups do show different distributions. Both retroelements and transposons havebeen identified in prokaryotes and eukaryotes (Galas, 1990; Hull and Will, 1989; Finnegan,41989a). However, within the eukaryotic kingdom, retroelements have a much widerdistribution, being found in representatives of plants, invertebrates, and vertebrates (Hulland Will, 1989; Finnegan, 1989a). In contrast, transposons, while commonly found ininvertebrates and plants, are not generally ibund in vertebrate species (Finnegan, 1989a).The different retroelement families also show different patterns of distribution. For theLTR-containing retroelements, examples of retrotransposons can be found in eukaryoticmicroorganisms, in both invertebrate and vertebrate species, and in plants. Retroviruses, incontrast, appear to be confined to vertebrate species. For the non-LTR-containingretroposons, L1 and Li-like sequences are widespread, being found in plants, invertebratesand vertebrates (Hutchison et al., 1989). In contrast, most SINE family amplificationsappear to have occurred after the mammalian radiation, with different SINE familiesarising independently in different species (Deininger, 1989). Only a few of the interspersedrepeated sequences found in the non-mammalian species studied to date have the featuresof mammalian SINEs. In contrast, the majority of interspersed repeats in mammals areSINEs (Deininger, 1989).As the subject of this thesis is a family of endogenous retroviral elements present inthe human genome, further discussion will focus on retroelements.B. AN OVERVIEW OF HUMAN RETROELEMENTSThe current number of known human repetitive families exceeds 50 (Jurka et al.,1992). Most of these families represent medium reiteration frequency (MER) repeats ofunknown origin with a copy number per genome ranging from 100’s to 1000’s (Jurka, 1990;Kaplan et al., 1991; Jurka et al., 1993). While these families show no evidence of geneticmobility, other human repetitive families do. The vast majority of transposable elements inhumans are believed to be retroelements. However, this classification is often made based5on structural similarities to known transposable elements of lower eukaryotes and not fromdirect demonstration of transposition through an RNA intermediate.1. NON LTR-CONTAINING RETROELEMENTSThese elements do not have a retrovirus-like structure but instead have sequencecharacteristics of mRNAs or other small RNAs that have been reverse transcribed and insertedback into the genome. These elements include the processed pseudogenes and the high copynumber retroposon families, SINEs and LINEs (Weiner et al., 1986). SINEs and LINEs wereoriginally distinguished on the basis of length of the individual elements i.e. short and longinterspersed nuclear elements, respectively (Singer, 1982). However, these elements also exhibitfunctional differences in terms of element transcription and coding capacity. Both the SINEs andLINEs represent very efficient retroposons, having expanded to very high copy numbers in thehuman genome. This is in contrast to the infrequent retrotransposition associated with processedpseudogenes which are present at only very low copy numbers.1.1 SINEsSINEs are high copy number, short interspersed repetitive elements typicallyranging in size from 75 to 500 bp in length. The ‘generic” SINE sequence contains aninternal RNA pol III promoter, an A-rich 3’ end which can be quite variable in length andexact sequence even among members of a given family, and direct flanking repeats, varyingin size from a few bp to several hundred bp in length and generally A rich (Deininger, 1989)(see Fig. 1-2).The predominant SINE in humans is the Alu family (Daniels et al., 1983). The Alufamily was originally defined as a fraction of renatured, repetitive duplex DNA containing adistinctive Alul cleavage site (Houck et al., 1979). The family is comprised of elements —300bp in length and present in i05iO6 copies, distributed fairly randomly in the humangenome. Alu elements have all the features of the generic SINE but also have a dimer-likestructure, with Alu-left and Alu-right being connected by an adenosine-rich linker. The6A. RETROVIRUSESgag p01 env ppe.g. murine leukemia virusB. RETROTRANSPOSONSPBS gag pole.g. Drosophila copia, yeast Tyie.g. human THE-iC. LINEsORF-i ORF-2 (pol)- AAAAA-e.g. human LiD. SINEs----——AAAAA-4e.g. human AluFigure 1-2. Structural features of retroelements. The general structures of the differentclasses of retroelements are depicted. Small arrows indicate short directrepeats of cellular DNA. P denotes the internal promoters of SINEs and LINEs.PBS and PPT show the positions of the 5’ tRNA primer binding site and 3’polypurine tract that serve as primer binding sites for minus- and plus-strandDNA synthesis during retroviral replication.7right half of the sequence is clearly related to the left half (consensus sequences for the twodimer halves are matched at about 68% of positions) but has 31 extra bases and does notcontain an active RNA pol III promoter. Thus, the promoter of the left half directs thetranscription of the entire element. Both halves of the Alu element were derived from the7SL RNA gene, a component of the signal recognition particle (Ullu and Tschudi, 1984).1.2 LINEsOriginally detected as rapidly reannealing components of genomic DNA, LINEswere initially defined as elements greater than 5 kb in length and present at copy numbersof >i per genome (Singer, 1982). The term LINE has since become synonymous with Lielements, the first LINE family identified and still the only LINE family as originallydefined. Hutchison et al. (1989) has suggested a redefinition of LINEs as active retroposonswhich encode proteins likely to mediate their own transposition. However, this definitioncould go still further in that the designation ‘LINE also denotes an element having certainstructural features as described below for Li elements.The predominant LINE in humans is the Li family, originally called the KpnIfamily due to the presence of conserved KpnI sites in the majority of the repeat familymembers (Adams et al., 1980). These elements are present at about i0 copies per haploidgenome (Hwu et al., 1986), with unit-length members being approximately 6 kb long andpresent at --4 x i03 copies but with the majority of elements (>5 x 10’’) being 5’ truncatedto varying degrees. Full length elements have the following features (reviewed in Hutchisonet al., 1989) (see Fig. 1-2). At the 5’ end is an 800 bp long untranslated sequence (UTS), theextreme 5’ end of which has an unusually high G + C content. The 5’ UTS appears tocontain an internal RNA pol II promoter (Swergold, 1990; Minakami et al., 1992). Thecentral region of the consensus Li sequence contains two ORFs of about 1,122 and 3,852bp, separated by a small distance but in the same reading frame. ORF-2 has amino acid(aa) homology to reverse transcriptase (Hattori et al., 1986) and to nucleic acid-bindingproteins (Fanning and Singer, 1987). Recently. ORF2 isolated from a recently transposed8Li element (Dombroski et al., 1991) has been shown to encode a reverse transcriptaseactivity (Mathias et al., 1991). At the 3 end of the element is an -200 bp UTS followed byan adenine rich 3’ terminus. Short direct repeats (usually <20 bp in length) flank bothcomplete and truncated elements. These features suggest that Li elements are produced byreverse transcription of polyadenylated mRNAs and insertion of the reverse transcripts intothe genome. 5’ truncated elements are thought to be the result of incomplete reversetranscription. Finally, the Li family contains a polymorphism for a 132 bp insert 5’ of ORF1 (Skowronski et al., 1988), with 50% of elements not truncated before this regioncontaining this additional 132 bp.2. LTR-CONTAINING RETROELEMENTSThe hallmark of this type of retroelement is the presence of LTRs and includesendogenous proviruses, retrotransposons, and the human THE-i family.2.1 RetrotransposonsRetrotransposons are structurally similar to retroviruses except for the absence ofan identifiable env domain (see Fig. 1-2). These elements typically average 5-6 kb in size,with two ORFs corresponding to the retroviral genes gag and poi. In most cases, gag andpol overlap for a number of codons in different reading frames. The pol gene of the yeastTyl is expressed from the same mRNA as gag through a ribosomal frameshifting event(Garfinkel, 1992), a mechanism similar to that used by retroviruses to downregulate theexpression ofpol relative to gag (Varmus and Brown, 1989). One notable exception to thisis copia, which has a single ORF with both gag and poi homologies. In this case copia mayregulate the expression ofpot relative to gag (Echalier, 1989; Yoshioka et al., 1991) througha splicing event that results in the formation of a 2 kb gag mRNA. The pol gene productsalmost always contain sequences similar to the protease (PR), reverse transcriptase,RNaseH and integrase (IN) functions of true retroviruses. The order of functional domainsin the pot gene distinguishes two broad retrotransposon groups. Most retrotransposons as9well as the true retroviruses proceed PR, RT, RH, IN in the pci gene (Echalier, 1989;Varmus and Brown, 1989). Members of the Tyl-copia group, which includes Tyl and Ty2 ofS. cerevisiae and copia of D. melanogaster, have the functional domains in pol ordered PR,IN, RT, RH (Echalier, 1989; Garfinkel, 1992). Tyl, Ty2 and copia specify the production ofvirus-like particles (VLPs) consisting of element-encoded reverse transcriptases, coatproteins, and full length transcripts (Echalier, 1989; Garfinkel, 1992). Murine LAP elementsalso direct the synthesis of a VLP which is compartmentalized largely in the cisternae ofthe endoplasmic reticulum (Kuff and Lueders, 1988). This localization is apparently due tothe TAP gag protein having a signal sequence. Particle assembly occurs within the secretorypathway; the processing ofgag and gag-pol proteins normally seen in retroviruses andretrotransposons is not observed in TAP (Kuff and Lueders, 1988). Reverse transcription inthese elements is very inefficient, possibly as a consequence of their unusual cellularlocation. Some retrotransposons, e.g. gypsy, do have a third ORF 3’ to the P0l gene thatcorresponds in position and size to retroviral env genes though homology to retroviral envgenes is not apparent (Echalier, 1989). However, retroviral enu genes are not wellconserved and these retrotransposon ORFs are characterized by a potential membranespanning domain near the C-terminus which suggests that they may indeed encodemembrane proteins. For lAPs, the majority of elements lack an env domain, but a smallsubset of —200 elements, designated TAPE (TAP envelope), have recently been identifiedthat do contain this retroviral ORF (Reuss and Schaller, 1991).Whether or not the human genome is considered to contain retrotransposonsdepends upon how the term retrotransposon’ is defined. As discussed above,retrotransposons are usually defined as retrovirus-like elements, 5-6 kb in length, havingonly an intracellular transposition cycle due to the lack of an env gene. Using thisdefinition, the human RTVL-H family (discussed in more detail below) could have beendesignated a human retrotransposon family because the elements originally isolated were5.8 kb in length and had only gag and pot homologies (Mager and Henthorn, 1984; Mager10and Freeman, 1987). However, the recent discovery of a small subpopulation of RTVL-Helements possessing an env domain (Hirose et al., 1993; Wilkinson and Mager, 1993)suggests that RTVL-H elements derive from a retrovirus and that most elements aredeleted. In contrast, there has never been any evidence to suggest that elements such as Tyand copia have ever had a 3’ ORF and thus could be considered ‘true retrotransposons’. AllLTR-containing retroelements in the human genome appear to represent degenerateproviral families. The exception is the THE-i family discussed below. Their small size (2.3kb in length) and lack of homology to any retroviral gene (see Fig. 1-2) makes theseelements distinct from both the degenerate proviral families and retrotransposons.2.2 THE-iThe THE-i (transposon-like human 1ement) family is present in -40,000 copiesplus -30,000 solitary LTRs (Paulson et al., 1985). Solitary THE-i LTRs had beenpreviously identified and called “0 elements” (Sun et al., 1984). The consensus THE-ielement is 2.3 kb in length with -350 bp LTRs and is typically flanked by 5 bp directrepeats (Paulson et al., 1985). The presence of LTRs and an apparent target siteduplication suggests that THE-i elements have dispersed within the genome by amechanism of retrotransposition. However, the consensus THE-i element, in addition to itssmall size, does not contain any identifiable ORF. There is no detectable homology to anyretroviral gene (Paulson et al., 1985); nor is there any discernible 5’ tRNA primer hindingite (PBS) or 3’ polypurine tract (PPT) for priming cDNA synthesis by reverse transcriptase(see Fig. 1-2). Interestingly, extrachromosomal circular DNAS containing THE-i sequenceshave been detected in several cell lines (Paulson et al., 1985; Misra et al., 1987). Discrete1.9 kb circular DNAS, representing ‘—10% of the THE-i homologous circular DNAS presentin HeLa cells, have been cloned and characterized (Misra et al., 1987; 1989). These clonesapparently were derived from a single genomic locus by a site specific recombination eventinvolving imperfect short direct repeats in the adjacent cellular sequences (Misra et al.,1989). Thus, these clones do not represent retrotransposition intermediates. It remains toiibe determined if the other THE-i-homologous circular DNAs in HeLa cells are derived fromsimilar recombination events at other unstable loci or if they are retrotranspositionintermediates.The transcriptional activity of THE-i repeats has been studied in various cell linesand tissues (Paulson et al., 1987). While discrete RNAs containing THE-i sequences havebeen detected, including a putative unit-length 1.9 kb transcript (Paulson et al., i985), thistranscription is in fact being directed by other external cellular promoters and both THE-istrands are represented in these transcripts (Paulson et al., i987). A THE-i LTR promotedtranscript has not been identified. Interestingly, several cDNA clones have been identifiedin which a THE-i LTR has provided a polyadenylation signal (Paulson et al., 1987). Also, afull length THE-i sequence has been found in the 3’ untranslated region of 3 of 4alternatively processed mRNAs that code for a member of the calmodulin superfamily ofcalcium binding proteins (Deka et al., 1988).2.3 Human endogenous retrovirusesThe human genome contains numerous sequences resembling in structure eitherfull-length or truncated proviruses (see Table 1-i). These human ndogenous retroyiral(HERV) sequences can be arranged into families which can be classified into one of twogroups based upon pol sequence homologies. HERVs with homology to mammalian type Cretroviruses have been termed Class I. Those with homology to mammalian types A, B andD, and avian type C have been termed Class II. For the individual HERV families, Larssonet al. (1989) have proposed a classification scheme based on the tRNA primer binding site.For example, HERV-E would designate the family of elements having a PBS homologous tothe 3’ terminus of glutamic acid tRNA [the single letter code for glutamic acid is El.However, this classification scheme has not been universally adopted and thus the originalacronyms assigned by the researcher(s) who discovered each family will also be used here.12Table1-1.SummaryofknownHERVfamilies.FromWilkinsonetal,,inpress.HERVsubfamilycopynumbermethodofdetectionsomesitesoftranscriptionevidenceforproteinFamily—______________expressionHERV-ERIHERV-E35-50rtva.homologyplacenta,colon,brain,colon&breasttumorlinesHs-51?rtvhomologyNP-22HEEV-EhomologynonedetectedERVI1(—15related)rtv.homologytranscriptionunlikelyHERV-R(ERV3)1(-40related)i-tv.homologyplacenta,enulowlevelsinmostcellsRRHERV-I—20chancer.a.btreatedTCCcellsHERV-l—25chance7(RTVL-l)S711(—20related)rtv.homologyplacenta,K562cellsHRES-11rtv.homologytumorcelllines28kd(gag?)HERV-PHuERS-Pi10-20tRNAhomology7HuERS-P220-30“?HuERS-P320-40?ERV-9—40chanceTCcells,placentaHERV-H—900(deleted)chanceTCcells,placenta,(RTVL-H)50-100(intact)tumorcelllinesHERV-K—50rtv.homologyTOcells,placenta,gag,prttumorcelllinesretroviralbretinoicacidCteratocarcinoinacelllinea2.3.1 Class IHERVsClass I HERV sequences are related to the mammalian C type viruses. Familiesconsist of full length and incomplete elements, with copy numbers ranging from one tothousands of dispersed copies. However, there is no evidence that these HERVs can produceinfectious virions. All HERV genomes sequences so far examined have been shown to bedefective, with elements being partially deleted andlor one or more of the gag, poi, and envdomains being disrupted by stop mutations.(a) HERV-EHERV-E sequences were identified by screening a human genomic library under lowstringency hybridization conditions using as a probe cloned African green monkey DNAcontaining sequences related to murine leukemia yirus (MLV) and bboon ndogenousvirus (BaEV) (Martin et al., 1981). The HERV-E family consists of both 8.8 kb full lengthelements and 6 kb truncated elements that lack env sequences (Steele et al., 1984; Repaskeet al., 1983; 1985). Each is estimated at 35-50 copies in the human genome (Steele et aL,1984). Solitary HERV-E LTRs have also been identified (Steele et al., 1984). Mapping ofHERV-E elements by in situ hybridization showed that while some HERV-E sequences aredispersed as single elements within the genome, there are also marked clusters in thetelomeric regions and to a lesser extent the centromeric regions of chromosomes (Taruscioand Manuelidio, 1991).The full length HERV-E clone 4-1 has been completely sequenced (Repaske et al.,1985). This 8.8 kb element has gag, pol and env related regions and is bounded by —490 bpLTRs. The 5’ PBS corresponds to glutamic acid tRNA. In-frame stop codons occur in each ofthe retroviral coding regions. A truncated clone 51-1 (Steele et al., 1984) has also beensequenced. This element contains only gag and pol related sequences and is bounded not byLTRs but by a unique, highly repetitive sequence comprised of 8-13 tandemly arrangedcopies of a 72-76 bp imperfect repeat.A number of HERV-E elements appear to be associated with regions of amplifiedgenomic DNA, as evidenced by the detection of conserved restriction endonuclease sites in14the 3’ DNA flanking full length elements (Steele et al., 1984; 1986) and by the use of DNA5’ to the truncated elements as a probe to isolate multiple, distinct clones of that family(Repaske et al., 1983). Further support comes from the clustering of HERV-E elementsobserved during in situ hybridization mapping (Tarusio and Manuelidis, 1991). One clusterof HERV-E elements was revealed during the characterization of the amylase gene family(Emi et al., 1988; Samuelson et al., 1990). A HERV-E-related clone, NP-2, is a full lengthproviral sequence present in two copies on the Y chromosome (Silver et al., 1987).Conservation of cellular flanking sequences suggest that the second copy of NP-2 resultsfrom gene duplication rather than from proviral insertion.HERV-E-related sequences have been detected in the genomes of Old Worldmonkeys, apes and humans (cited in Repaske et al., 1985). Clone 4-14 has beendemonstrated by PCR analysis to be present in the same location in the genomes of OldWorld monkeys, apes and humans (Shih et al., 1991) indicating that this element insertedinto the primate lineage in a common catarrhine ancestor at least 30 MYr ago. Some fulllength elements are found associated within similar regions of amplified DNA in bothchimpanzees and humans (Steele et al., 1986) suggesting that the amplification occurredbefore the chimpanzee/human divergence 5 MYr ago. Thus it appears that the ancestralHERV-E element(s) entered early into the primate lineage and have expanded in thegenome through successive integrations into the germ line (dispersed single elements) andthrough genome amplification events (clustered elements).HERV-E-related transcripts have been detected in human placenta, spleen, normalcolon mucosa and primary colon cancers, and breast and colon carcinoma cell lines(Gattoni-Celli et al., 1986; Rabson et al. 1983; 1985). Major RNAs of 3.0 and 1.7kb andminor RNAs of 3.6 and 2.2 kb were seen. Analysis of partial eDNA clones was consistentwith the interpretation that these RNAs represent spliced env transcripts, with the smallertwo transcripts deriving from elements with two deletions (Rabson et al., 1985). The sizedifference between the major and minor RNAs suggests the use of an alternative splice site—500 bp apart. Full length HERV-E transcripts have also been detected (unpublished15results cited in Taruscio and Manuelidis, 1991; Yeh et al., 1991). Finally non-HERV-E LTRcellular sequences apparently being polyadenylated by an HERV-E LTR have also beenidentified (Rabson et al., 1985; Tomita et al., 1990). The identification of possible HERV-Eprotein products has not been reported.(b) ERV1The ERV1 sequence was isolated from a human genomic library using a pot regionprobe derived from a chimpanzee endogenous provirus termed CH2 (Bonner et al., 1982).ERV1 is a truncated provirus of --8 kb having gag, poi and env domains but lacking a 5’LTR. The 5’ PBS, if still present, has not been characterized. Under high stringencyhybridization conditions ERV1 has been resolved as a single copy element in the humangenome and has been mapped to chromosome 18q22-23 (O’Brien et al., 1983; Renan andReeves, 1987). However, under low stringency conditions, hybridization revealed that theremay be as many as 15 related elements in the human genome (O’Brien et al., 1983).Sequence comparisons between the pol region of ERV1 and that of CH2, as well as acomparison between the 3’ flanking sequences of each element show ERV1 and CH2 to bethe same provirus occupying orthologous sites in the two species. Thus, ERV1 has been inthe primate genome for at least 6 MYr. ERV1-related sequences have also been detected inOld World monkeys (unpublished results cited in O’Connell et al., 1984) suggesting an earlyentry into the primate lineage. ERV1 expression has not been detected in human placenta,which is not surprising considering ERV1 lacks a 5’ LTR.(c) ERV3ERV3 was also isolated from a human genomic library using the chimpanzeeprovirus CH2 poi probe as well as the LTR of BaEV (O’Connell et al., 1984). ERV3 is asingle copy, full length proviral genome that has been mapped to chromosome 7 (O’Connellet al., 1984). The element is 9.9 kb in length, bounded by -. 590 bp LTRs and has gag, potand env domains and a PBS sequences complementary to arginine tRNA (hence thealternate name HERV-R suggested by Larsson et al., 1989). As was the case for ERV1,hybridizations performed under low stringency conditions suggested the presence of 10 or16more additional ERV3-related sequences in the human genome (O’Connell et aL, 1984).ERV3-related sequences can be found in other primates including apes (but not gorilla), OldWorld monkeys and New World monkeys (cited in Cohen and Larsson, 1988).Sequencing of ERV3 has shown it to be defective with all three retroviral codingregions being disrupted by mutations. The enu region does contain a 1.9 kb ORF that spansthe surface glycoprotein domain. However, a mutation within the transmembrane domainwould prevent any protein product from being anchored in the cell membrane.Interestingly, antibodies raised against env-derived synthetic peptides do detectputative env-encoded proteins in placental tissue suggesting that ERV3 or ERV3-relatedsequences are translationally competent (Larsson et al., 1993). ERV3 is highly expressed innormal human placenta, with major transcripts of 3.5, 7.3 and 9 kb (Kato et al., 1987).These transcripts all represent spliced transcripts analogous to retroviral subgenomicmRNAs. However, only the 3.5 kb transcript terminates as expected within the 3’ LTR. The7.3 and 9 kb transcripts extend into 3’ flanking cellular DNA sequence (Kato et al., 1987).Interestingly, contained within this flanking DNA is a 1.3 kb ORF deduced to encode a zincfinger protein related to the Kruppel gene of Drosophila (Kato et al., 1990). A wide surveyof tissue samples and cell lines (Cohen et al., 1988; Kato et al., 1988) showed the ERV3transcripts to be differentially expressed. The 3.5 and 9 kb RNAS were widely expressed,though in some samples only the 3.5 kb transcript was detected. The 7.3 kb transcript wasonly detected in samples of placental chorionic villi. A most striking result was that little orno ERV3 expression could be detected in 6 choriocarcinoma cell lines. Choriocarcinomas arederived from the placental trophoblastic epithelium, the outermost cells of the chorionic villiand high expressers of ERV3 RNA. ERV3 was not structurally altered in thesechoriocarcinoma cells, nor were there differences in methylation patterns of the ERV3genome between these cells and normal placenta (Kato et al., 1988). Possibly,choriocarcinoma cells are defective for a specific transcription factor or contain pointmutations in the ERV3 transcriptional regulatory sequences.17(d) RRHERV-IThe RRHERV-I family was discovered by chance during an unrelated PCR cloningstrategy (Kannan et al., 1991). This family consists of —20 copies in humans. A similarnumber of elements can also be detected in the genomes of Old World monkeys and apesindicating that, like other HERVs, RRHERV-I has been present in the germline for at least30 MYr (Kannan et al., 1991). Expression of RRHERV-I has been detected in 9117 cells, asubline of the teratocarcinoma cell line PA-i, when the cells are treated with retinoic acid(Kannan et al., 1991). The single transcript is —4 kb in size, suggesting that RRHERV-I,like HERV-E and ERV3, is primarily expressed as a spliced sub-genomic RNA. This wasalso suggested by the 3.3 kb composite cDNA structure for the RRHERV-I element,assembled from several cDNA clones (Kannan et al., 1991). The isolation of a genomic clonehas not been reported. The cDNA clones contain stop codons in the putative env ORFrendering this transcript incapable of encoding a protein product (Kannan et al., 1991).A recent review on HERVs (Wilkinson et al., in press) has grouped the above HERVfamilies into a larger superfamily. This was done because comparisons of env and LTRregions between various clones from these different families indicated that the elementswere significantly related despite the fact that they have different PBSs. Nucleotidesimilarity among env and LTR regions is particularly sensitive for detecting closely relatedelements since among infectious retroviruses it is known that these regions diverge fasterthan other parts of the genome (Doolittle et aL, 1989; Desai et al., 1986; Shimotohno et al.,1985). Wilkinson et al. (in press) have suggested the name HERV-ERP for the superfamilywhich would reflect the fact that several HERV families with different PBSs (glutamic acidtE; arginine ‘R”; isoleucine ‘F ) are included in the superfamily.(e) S71S71 was isolated from a human genomic library under reduced stringencyhybridization conditions using simian sarcoma-associated virus (SSAV)-derived probes18(Leib-Mosch et aL, 1986). Under high stringency conditions S71 can be distinguished as asingle element within the human genome and has been mapped to chromosomal position18q21 (Brack-Werner et al., 1989). Under low stringency hybridization conditions, S71-derived probes detect an additional 15-20 related sequences in human DNA (Leib-Mosch etal., 1992). A recent phylogenetic study suggests that S71 is present in Old World but notNew World monkeys (Brack-Werner et al., 1993)The S71 element appears to be a composite of retrovirus related and unrelatedsequences and as a result, the presence of short direct repeats was used to delineate the 5.4kb element. The retrovirus related sequences of S71 constitute a truncated proviruscontaining type C retrovirus-related gag and pot gene sequences and a SSAV-related 3’LTR-like sequence of 535 bp (Leib-Mosch et al., 1986). The gag gene is complete but the 5half of the pol gene, normally containing the protease and reverse transcriptase is replacedby a - 1.1 kb sequence which has recently been identified as a solitary HERV-K LTRoriented in the antisense direction (Leib-Mosch et al. in press). Further analysis of thisdeletion ofpot sequences and the insertion of a solitary HERV-K LTR revealed these eventsto be independent of one another, the insertion occurring just upstream of the deletionbreakpoints. In addition, it appears that the deletion occurred first, being identifiable byPCR in Old World monkeys, while the HERV-K insertion occurred within the ape lineage(Leib-Mosch,. personal communication). S71 related sequences having an intact pot domainhave been identified by a PCR strategy that used S71-specific primers that spanned the poldeletion. Sequencing of the PCR clone revealed several stop mutations disrupting the polORF (Leib-Mosch, personal communication). S7 1 and S7 1-related elements also contain atruncated env domain. The first 400 bp at the 5’ end of the S71 element is also unrelated insequence to retroviruses (Werner et al., 1990).S71 expression has not been detected by Northern analysis using S71-specific probes(Leib-Mosch, personal communication). A previous report of transcripts detected in theleukemic cell line K562 (Leib-Mosch et al., 1992) had been done using a S71 probe that hadincluded part of the then unidentified HERV-K LTR sequence. The observed hybridization19was due to this HERV-K sequence (Leib-Mosch, personal communication). However, a 1.1kb cDNA containing a 350 bp S71-related region has been isolated from a placental library(Leib-Mosch et al., 1992).(f) RTVL-IRTVL-I elements were first identified during the analysis of the haptoglobin-relatedlocus on chromosome 16 (Maeda, 1985). Full length elements are 9 kb in length, withinternal sequences related to gag, poi and enu, a PBS for isoleucine tRNA, and flanked by—500 bp LTRs (Maeda, 1985, Maeda and Kim, 1990). The name RTVL-I comes fromretroyirus-like element with an isoleucine tRNA PBS; the alternate name, suggested byLarsson et al. (1989), is HERV-I. The RTVL-I elements associated with the haptoglobinrelated locus have been sequenced in their entirety and have been shown to be disrupted bynumerous mutations rendering the elements incapable of encoding retroviral-like proteins.In addition, the two RTVL-I elements are also disrupted by Alu insertions (Maeda and Kim,1990). Investigation of RTVL-I expression at the RNA level has not been reported.Copy number determinations for the RTVL-I family vary depending upon the probeused. Gag region probes indicate 8 copies in the human genome while env-specific probesdetect —25 copies (Maeda and Kim, 1990). This suggests that many RTVL-I elements aredeleted or truncated. RTVL-I sequences appear to have been present in the primate lineagebefore the Old World monkey/ape divergence as RTVL-Ia, one of the elements in thehaptoglobin-related locus, has been determined to be integrated at the same position in thegenomes of Old World monkeys and apes (Shih et aL, 1991).(g) ERV9The first ERV9 element isolated, cDNA clone pHE.1, was identified during thecharacterization of cDNA clones containing a novel class of repetitive sequences that arepredominantly expressed in undifferentiated NTera2Dl cells [NTera2Dl is ateratocarcinoma cell line], and whose expression is negatively regulated upon retinoic acidinduced differentiation (La Mantia et al., 1989; 1991). This 4 kb cDNA had regions ofgag,pot, and env homologies, although the env region appeared to be partially deleted and the 5’20end of the clone did not extend through a complete gag (La Mantia et al., 1991). Isolation ofa genomic ERV9 element revealed exceptionally long, 1.8 kb LTRs flanking —6 kb ofinternal sequence that contained an arginine tRNA PBS (La Mantia et al., 1991). Withinthe LTRs are two types of novel repeats, 72 bp “B elements” and 41 bp “E elements”.Preliminary analysis of several genomic clones indicated that ERV9 LTR sequences areheterogeneous in length and the length variation is due to the number of tandemlyrepeated E and B subelements (Lania et al., 1992). Sequenced elements have all beenshown to be disrupted by mutations and are therefore noncoding. Southern analysissuggests that there are -.40 copies of ERV9-related sequences in the human genome (LaMantia et al., 1991) plus an additional 3000-4000 solitary LTRs (Zucchi and Schlessinger,1992). Expression of an 8 kb ERV9 transcript corresponding to a full length RNA has beendetected in undifferentiated NTera2D1 cells and is down-regulated when the cells areinduced to differentiate by retinoic acid (La Mantia et al., 1991). Deletion analysis of theERV9 LTR has identified the minimal promoter region which spans from -70 to +6 relativeto the major transcription start site and contains a putative Spi binding site and a putativeinitiator element necessary for correct transcription start site utilization. A TATA boxtypical of other retroviral LTRs was not identified (La Mantia et al., 1992). The repeatelements that apparently are associated with expression in undifferentiated NTera2D 1cells were not necessary for transcription (La Mantia et al., 1992).(h) HRES-1HRES-1 is an HTLV-1-ielated ndogenous sequence isolated by low stringencyscreening of a human genomic library with an HTLV-1 LTR-gag probe (Perl et al., 1989).HRES-1 is present in a single copy that has been mapped to chromosome 1q42 (Perl et al.,1989, 1991). An HRES-1 related sequence can also be detected in the genomes of both OldWorld and New World monkeys but not in other mammals indicating that this elemententered the primate lineage at least 45 MYr ago.Nucleotide sequence analysis of —2 kb of the HRES-1 genomic clone revealed asingle 5’ LTR-like sequence, 684 bp in length, followed by a PBS most closely matching21histidine tRNA and a gag-related sequence containing two potential overlapping,alternative ORFs. Only one of these ORFs, encoding a putative Mr 25,000 protein, hasdetectable sequence homology with retroviral gag proteins and then only in two shortstretches. Expression of HRES-1 has been shown in MA-T cells (cultured T lymphocytes ofa patient, MA, with type II cryoglobulinemia), melanoma cells, HL-60 promyelocyticleukemia cells, MOLT-4 T-cell leukemia cells, normal placenta and EBV-transformednormal human peripheral blood B-lymphocytes (Pen et al., 1989). HRES-1 transcripts are 6kb in length. Two antisera generated against polypeptides derived from each of the twoshort domains of HTLV homology both recognize a 28 kDa protein on Western blots of H9human T-cell leukemia cell lysates (Banki et al., 1992). This would be the predicted size fora protein encoded by the HTLV-related ORF.Ci) HERV-PHERV-P elements were identified by screening a human genomic library using as aprobe sequences complementary to proline tRNA PBS (Harada et al., 1987; Kroger andHorak, 1987). Several known retroviruses use tRNAPrO molecules as primers for reversetranscription, including the infectious human retroviruses HTLV-I and HTLV-II. TheHERV-P clones isolated ranged in size from 5 to 9 kb and were classified into three familiesHERV-P1, -P2, -P3 (HuERS -P1, -P2, -P3, as originally termed by Harada et al., 1987)because their LTR sequences were completely different. HuERS-Pi is present in 10-20copies in the human genome. Two elements have been isolated. One is —8 kb in length with690 bp LTRs. The other is —7.5 kb in size and has an Alu insertion within the 5’ LTR.HuERS-Pi related sequences have been detected in Old World monkeys (Harada et al.,1987). The HuERS-P2 family is comprised of 20-30 copies in the human genome, with theone characterized element having 890 bp LTRs flanking —3.2 kb of internal sequence. Theshort internal region suggests that this element is not complete for retrovirus-relatedsequences. HuERS-P2 related sequences can be detected in Old World monkeys (Harada etal. 1987). The HuERS-P3 (Harada et al., 1987) or HuRRS-P (Kroger and Horak, 1987)family is comprised of 20-40 elements. The one full length element isolated is 8.1 kb in22length and contains— 630 bp LTRs. HuERS-P3 related sequences can be detected in thegenomes of apes, Old World and New World monkeys (Kroger and Horak, 1987; Harada etal. 1987). For all three HERV-P families, the internal sequences remain largelyuncharacterized so it is not known what homologies to retroviral gene sequences exist andwhether LTR differences between the families will be reflected by differences in theinternal sequences. The expression of HERV-P elements has also not been studied.2.3.2 Class II HERVs (HERV-K)Class II HERVs were identified by low stringency hybridization with DNA probesencompassing either various regions of the mouse mammary tumor virus (MMTV) genorne(Callahan et al., 1982; Deen and Sweet, 1986; Westley and May, 1984) or by using a probefrom the pol gene of the Syrian hamster TAP (Ono, 1986). The prototypic Class IT HERV is9.2-9.5 kb in length, has -970 bp LTRs and contains a PBS for lysine tRNA - hence thecommon name HERV-K with K denoting lysine (Callahan et al., 1985; Ono et aL, 1986).The proviral genome represents a mosaic of sequences characteristic of different infectiousretroviral genera with enu gene sequences most closely related to type A (TAP) retroviruses,LTR sequences most homologous to type D viruses and poi gene sequences related to eachof these as well as to mammalian type B and avian type C viral genomes (Callahan et al.,1985; Ono et al., 1986; Horn et al., 1986). Such a mosaic structure suggests a complexevolutionary origin. There are —50 copies of the HERV-K proviral sequence present in thehuman genome (Ono, 1986) as well as —25,000 copies of LTR-related sequences (LeibMosch et al., in press). The LTR probe used to determine the copy number of solitaryHERV-K LTRs does not hybridize with SINE-R elements which are derived partly fromHERV-K LTR sequences (Ono et al., 1987a) (see section C-3.1 below).HERV-K elements have been mapped to human chromosomes 1 (HLM-2) and 5(HLM-25) and chromosomes 7, 8, 11, 14, and 17. (Horn et al., 1986). Interestingly, theHERV-K LTR sequences are distributed throughout the human genome in an irregular,apparently nonrandom manner, with certain chromosomes containing hundreds to23thousands of HERV-K LTR sequences (e.g. chromosomes 3 and 16) while otherchromosomes carry only a few copies (chromosome 18) or completely lack HERV-K LTRsequences (the X chromosome) (Leib-Mosch et al., in press). This apparent clusteringsuggests that some expansion of HERV-K LTR sequences within the genome occurredthrough amplification of the surrounding genomic DNA. A mechanism of this kind has beenshown for the HERV-E family (Steele et al., 1984; 1986; Repaske et al., 1983). Anothermechanism for the generation of solitary HERV-K LTRs is through LTR-LTRrecombination that deletes the internal sequence and one LTR of an HERV-K provirus.The fact that solitary HERV-K LTRs show target site duplications of 6 bp that are typicalof integrated proviruses supports this mechanism (Leib-Mosch et al., in press).There is a second, distinct HERV-K family, containing a lysine tRNA PBS sequencethat is different from the HERV-K family described above (May et al., 1983; May andWestley, 1986). The above mentioned HERV-K family has a PBS for lysine tRNA12(anticodon CUU) while this second HERV-K family has a PBS for lysine tRNA3 (anticodonUUU). This second family has LTRs only -42O bp in length, again distinct from fiERy-K(CUU).HERV-K related sequences have been detected, by Southern analysis, in thegenomes of humans, apes and Old World monkeys, but not in New World monkeys andprosimians (Mariani-Costantini ét al., 1989). This suggests that the ancestral HERV-Kelement(s) entered the genome of Old World primates after the divergence of New Worldmonkeys but before the evolutionary radiation of large hominoids (Mariani-Costantini etal., 1989). There is evidence that some HERV-K sequences may still be transpositionallyactive. Craig et al. (1991) identified an insertion in the chimpanzee genome not present inhumans, indicating a relatively recent insertion after the human-chimpanzee divergence.Also, one full length clone (fiERy-Kb), sequenced in its entirety (Ono et al., 1986)possesses LTRs that are nearly identical, containing just two nt differences over 968 bp.This suggests a relatively recent insertion into the human genome. It is interesting that24this element, unlike most HERVs that have been sequenced to date, also contains very fewmutations interrupting its putative coding regions.HERV-K transcripts, including an 8.8 kb full length mRNA, have been detected invarious cell lines as well as human placenta (Ono et al., 1987b; Franklin et al., 1988). Ingeneral, placenta was found to express the highest levels of HERV-K transcripts (Franklinet al., 1988). Stimulation of HERV-K expression has been observed in steroid-treated T47Dbreast carcinoma cells suggesting that HERV-K LTRs contain a hormone responsiveelement (HRE) similar to MMTV (Ono et al., 1987b). Recently, HERV-K expression hasbeen detected in teratocarcinoma cells (Lower et al., 1993a; 1993b). A full length mRNA isdetected plus smaller spliced transcripts including a singly spliced 3.3 kb subgenomic envRNA and a doubly spliced 1.8 kb RNA. This 1.8 kb transcript is of particular interestbecause it contains an ORF that has the potential to encode a 12 kDa protein. Analysisofthe predicted protein sequence revealed a stretch of basic amino acids that closelyresembles the putative RNA binding motifs of the regulatory proteins of lenti- andspumaviruses (Lower et al., 1993a). Transcripts containing HERV-K LTR sequences areexpressed in a variety of human tissues but most of these appear to represent solitaryHERV-K LTRs expressed as parts of unrelated cellular transcription units (Leib-Mosch etal., in press).Sequence analysis of expressed HERV-K genomes revealed that these mRNAscontain long ORFs with the potential to code for a complete set of viral structural proteins(Lower et al., 1993a). A HERV-K gag gene has been expressed in E. coli as a fusion protein,with the HERV-K portion having the predicted size of 73,000 kD (Mueller-Lantzsch et al.,1993). Expression of the gag gene with the 3’ protease gene results in a fusion protein withautoproteolytic activities resulting in the processing of the gag protein (Mueller-Lantzsch etal., 1993). Thus, at least some HERV-K elements encode a functional protease. Finally,there is now substantial evidence that HERV-K encodes the retrovirus-like particles,termed HTDV (iuman teratocarcinoma derived viruses), expressed in relatively highamounts in the GH teratocarcinoma cell line and which represent immature virions25apparently arrested in budding stages. An anti HERV-K gag antiserum has been shown toreact specifically with HTDV particles (Boller et al., 1993). The antiserum reacts primarilywith a protein of 30 kD (Boller et al., 1993), which is the predicted size for the putativemajor core protein (p30) of HERV-K (Mueller-Lantzsch et al., 1993). Lastly, the expressionof the HERV-K gag gene has been shown to be sufficient for particle formation whentransfected into mammalian cells (Korbmacher et al., 1993). It is also possible that HERVK encodes the steroid induced retrovirus-like particles observed in the T47D breastcarcinoma cell line (Keydar et al., 1984). These particles have been shown to be related toMMTV and to be of endogenous origin (Faff et al., 1992). However, a direct association withHERV-K has not yet been shown.2.4 The RTVL-H familyThe RTVL-H family is a large Class I family of HERVs which is the subject of thisthesis. This family consists of 1000 elements and a similar number of solitary RTVL-HLTRs per haploid human genome (Mager and Henthorn, 1984). At least some of the solitaryLTRs have arisen through LTR-LTR recombination that deletes the internal sequence andone LTR (Mager and Goodchild, 1989). The elements are distributed on all chromosomeswith possible clusters on chromosomes ip, 7q and lip (Fraser et al., 1988). RTVL-Helements are flanked by 5 bp direct repeats, are bounded by 400-450 bp LTRs, and mostare 5.8 kb in length (Mager and Freeman, 1987). These elements have P0l and partial gaghomology to murine type C retroviruses but also have a distinct region ofgag homology tohuman T cell leukemia virus (HTLV) types I and II (Mager and Freeman, 1987). The major5.8 kb class of RTVL-H sequences has no env region and has 4 shared deletions in the polgene (with respect to MLV; see Fig. 1-3) and termination codons, including several that areshared, which render them translationally defective (Wilkinson et al., 1990; Mager andFreeman, 1987). Recently, a subpopulation of RTVL-H elements has been identified thatcontains some or all of the pol regions deleted in most elements (Wilkinson et al., 1993;Hirose et al., 1993) (Fig. 1-3c). We have termed this subpopulation, RTVL-Hp. Also, env26- /•/////// / //0 500 1000/./04F—-ccAB//,//‘>.:— 500MLV0RTVL-H2MLV27,S>.,-Figure 1-3. Characterization of RTVL-H pot region deletions. A) Dot matrix comparison plotof the pot-related region of RTVL-H2 with the pol gene of MLV. A translation ofthe RTVL-H2 sequences between nucleotide (nt) positions 2591-4727 (Magerand Freeman, 1987) which maximizes homology was compared to MLVpo1product predicted by the published MLV sequence, nt positions 2223-5837(Shinnick et al., 1981). Boxes labelled A, B, C and D indicate the regionsdeleted in the RTVL-H sequence. The sequence comparison was done with the“compare” program of GCG using a stringency of 13 in a window of 30.aa.B) A schematic presentation of the functional domains of MLV pol and theeffect of the deletions on the homologous sequences in RTVL-H. The stippledboxes represent the reverse transcriptase domain, the hatched boxes representthe RNase H domain, and the boxes filled with diagonal lines represent theendonuclease domain. The regions shown begin at the protease cleavage sitebetween the protease and reverse trancriptase peptides encoded by the potgene. C) Dot matrix comparison of the pot-related translation of anartifically constructed RTVL-Hp sequence with that of the MLV pot gene. TheRTVL-Hp sequence was constructed by inserting the sequences of regions A, Band C, determined from different RTVL-H isolates, into the sequence of RTVLH2 at the appropriate positions. (modified from Wilkinson et al., 1993)C 500/‘<_/•7 ///// /0>F—/./1,.//...*/ /./1000////// ,.,// -//7/; - / /• ..7c_..MLV•—028containing RTVL-H elements have been identified (Wilkinson and Mager, 1993; Hirose etal., 1993). Other RTVL-H subpopulations can be defined on the basis of their LTRsequence. Sequence comparisons of >25 LTRs has led to the classification of three LTRsubtypes, Type I, Ta and II (Mager, 1989; Wilkinson, 1993).RTVL-H elements are expressed in a variety of human cell lines including bladdercarcinoma cell lines, HeLa and most highly in embryonal carcinoma or teratocarcinoma celllines (Wilkinson et al., 1990). RTVL-H expression has also been detected in some humanprimary tissues including placental amnion and chorion (Wilkinson et al., 1990; Johansenet al., 1989). The primary transcript produced is a unit length 5.6 kb RNA derived from themost abundant 5.8 kb class of deleted elements. A 3.7 kb spliced transcript is alsocommonly observed, the result of a splicing event from a common splice donor (SD) sitelocated in the 5’ end of the internal sequence to one of a cluster of splice acceptor (SA) sitesapproximately 2 kb downstream and just upstream of the pot domain (Wilkinson et al.,1990). This type of splicing event, which removes gag while leaving poi intact, has not beenobserved for other retroviruses and retrovirus-like elements. If a probe specific for one ofthe pot regions found only in RTVL-Hp elements is used, transcripts can be detected only inthe teratocarcinoma cell line Tera-1, with one transcript, 6.5 kb in size, corresponding tothe size predicted for a unit length RTVL-Hp RNA containing an intact pot (Wilkinson etal., 1993). Larger transcripts of 8.0 and 8.6 kb are also seen and most likely representRNAS derived from env-containing elements (Wilkinson et al., 1993; Hirose et al., 1993).The LTRs have been shown to contain the regulatory elements necessary to directRTVL-H transcription (Wilkinson et aT., 1990). RTVL-H LTRs can also promote, enhanceand polyadenylate transcripts of heterologous reporter genes (Feuchter and Mager, 1990;Mager, 1989). Non-RTVL-H cellular sequences apparently being promoted by an adjacentLTR have also been identified (Feuchter et al., 1992; Feuchter-Murthy et al., 1993; Lui andAbraham, 1991). Both the promoter activity in transient assays and the level ofendogenous RTVL-H expression is dependent on cell type (Feuchter and Mager, 1990;Wilkinson et al., 1990) suggesting that tissue-specific transcription factors interact with the29LTRs. SV4O large T antigen has been shown to activate transcription from several LTRs(Feuchter and Mager, 1992). Also, while teratocarcinoma cell lines express high levels ofRTVL-H transcripts, when these cells are induced to differentiate with retinoic acid, RTVLH transcription ceases (Wilkinson, 1993).No RTVL-H element capable of encoding retroviral proteins has yet been isolated.However, the great expansion of clearly defective elements within the genome suggeststhat their retrotransposition has been facilitated by complementation, possibly by coding-competent RTVL-H. Indeed, some of the structurally intact RTVL-H elements that haverecently been isolated contain fewer mutations and ORFs of > 1 kb in the pol andlor envdomains (Hirose et aL, 1993; Wilkinson et al., 1993).C. EVOLUTIONARY HISTORY OF RETROELEMENTS1. SINEs1.1 The origin of SINEsThe origins of transposable elements can be surmised from sequence and structuralsimilarities to other genetic elements of known function. SINEs, since they contain an RNAp01 III promoter, are thought to be derived from RNA pol Ill-transcribed genes. Forexample, the primate Alu family and the rodent B1 family are both thought to be derivedfrom the 7SL RNA genes (Ullu and Tschudi, 1984). 7SL RNA is an abundant cytoplasmicRNA which functions in protein secretion as a component of the signal recognition particle.Another primate SINE, PRE, represents a repetitive family derived from an unknownstructural RNA (Ricke et al., 1992). SINE families from other mammalian species appear tobe derived from different tRNA genes, each apparently arising independently in thedifferent mammalian lineages (Daniels and Deininger, 1985; Deininger, 1989 andreferences there in). For example, the Monomer family is an independent SINE family inthe prosimian, Galago, genome. Originally proposed to be derived from tRNAMet (Daniels30and Deininger, 1985), a more recent comparison between the Monomer sequence andtRNAs showed that it is more likely to have originated from tRNALYS (cited in Ohshima etal., 1993).Since most SINE families are derived from parental genes that are not highlyamplified, mutational and/or structural modifications of the parental gene appear to beneeded to increase the efficiency of the retroposition process before the SINE can beamplified (Daniels and Deininger, 1985; Deininger and Daniels, 1986). Several factors maybe important for the efficient amplification of SINEs. First, the presence of an internal RNApol III promoter allows for retrotransposed copies to also be transcribed. It is interestingthat the promoter sequences of both Alu and Monomer families show a better match to theRNA pol III promoter consensus than do their parental genes (Quentin, 1992b; Daniels andDeininger, 1991). This may have allowed Alu and Monomer elements to escape therequirement for upstream activating sequences (UAS) that are required by 7SL RNA andtRNA genes, in addition to the internal promoter, for efficient transcription (Ullu andWeiner, 1985; Schmid and Maraia, 1992). These UAS would be lost upon transposition.Also, the promoter needs to be functional in the germline as only germline transpositionscan result in heritable sequence amplification. A second factor allowing for efficientretroposition may have been the acquisition of the A-rich sequence at the 3’ end of theSINE element, which is believed to allow for the self-priming of SINE transcripts duringreverse transcription (Schmid and Maraia, 1992). Finally, secondary structure that hasbeen retained throughout the evolution of Alu and B1 families is believed to allow for anincreased association of these RNAS with reverse transcriptases and other components ofthe retrotransposition machinery (Labuda et al., 1991; Sinnett et al., 1991). However, thatSINE families show less secondary structure than the parental genes may help eliminatepossible interference with reverse transcription (Daniels and Deininger, 1985). The smallsize of SINE transcripts may also have facilitated their packaging into retrovirus-likeparticles.31L2 The primate Alu familyThe evolutionary history of the Alu family has been extensively studied. Asdiscussed above, a typical Alu element is a dimeric structure, composed of two relatedmonomeric sequences arranged in tandem (Deininger et al., 1981). Each monomer ispartially homologous with the 7SL RNA gene, from which they are thought to have arisenthrough a deletion of the central 7SL-specific sequence (Ullu and Tschudi, 1984). It hasbeen proposed that the evolution of the Alu family occurred in two phases (Quentin, 1992b).The first phase involved only monomeric elements and was characterized by largealterations of the sequences leading to successive replacements of Alu-like progenitors. Ithas been proposed that large alterations might have been required in the first step ofevolution so that, as the Alu progenitors amplified, they did not interfere with the parent7SL RNA gene (Daniels and Deininger, 1985). One proposed evolutionary history of the Alufamily is as follows: The FAM (fossil lu monomer) family (Quentin, 1992b) arose from a7SL RNA gene. This family gave rise to the FRAM and FLAM Cfree right and free left Alumonomer) families (Quentin, 1992b; Jurka and Zuckerkandi, 1991). The fusion of a FLAMand a FRAM sequence produced the first dimeric Alu element (Quentin, 1992a; 1992b). Thesecond phase started with the first Alu dimeric element and has been characterized by thestabilization of the progenitor sequences in that the progenitor sequences of the Alu dimericelements have evolved only through base substitutions and small insertions and deletions(Britten et al., 1988; Jurka and Smith, 1988;Jurka and Milosavijevic, 1991; Shen et al.,1991).The identification of a SINE family in rodents, the B1 family, that is also derivedfrom the 7SL RNA gene, suggests that the progenitor FAM element(s) may have beenpresent before the rodent-primate divergence 65 MYr ago. Rodent Bi elements aremonomeric and are similar to the left Alu monomer (Schmid and Jelinek, 1982). Sequencedivergence and amplification of the progenitor sequence(s) after the rodent and primatelineages diverged could have resulted in the creation of species-specific variants. However,32it can not be ruled out that these two families arose independently from the 7SL RNA gene(Weiner et al., 1986).The dimeric Alu elements arose early in the primate lineage, being found in thegenomes of all primates studied to date including the prosimian, galago (Daniels et al.,1983). Alu evolution within the primate lineage is characterized by a succession of distinctsubfamilies that have arisen and amplified at different times during primate evolution(Britten et al., 1988; Quentin, 1988; Shen et al., 1991; Jurka and Milosavljevic, 1991).Subfamilies are identified on the basis of sets of coordinated, diagnostic changes, with eachsuccessive subfamily adding new diagnostic changes to those of the previous subfamily.This has led to the ‘master gene’ model of Alu evolution (reviewed in Deininger et al., 1992)in which a master gene locus is responsible for the amplification of all Alu subfamilies.Subfamilies arise through the simple accumulation of mutations in the master gene withsubsequent amplification. The reason for the punctuated nature of the mutation rate isunclear. This theory also implies that Alu copies derived from this master gene are notactive in further transpositions. However recently, evidence for multiple activelyretroposing Alu source genes has been reported (Hutchinson et al., 1993), supporting analternate ‘transposon’ model. In this model, copies of the source gene may themselves betranspositionally active. This results in the generation of Alu subfamilies that may sharesome mutations but differ in others (Hutchinson et al., 1993).Most Alu amplification appears to have occurred quite early in the primate lineageas 85% of the Alu family members appear to have been present before the ape/Old Worldmonkey divergence 30 MYr ago (Shen et al., 1991). There have also been more recentamplifications, as evidenced by the identification of a human-specific Alu subfamily (Batzerand Deininger, 1991; Matera et al., 1990) indicating expansion of this subfamily within thelast 4-6 MYr. That Alu transposition is still occurring within the present day humangenome is evidenced by recent reports of de novo Alu insertions resulting in disease inseveral individuals (Vidaud et al., 1993; Wallace et al., 1991).332. LINEs-TIlE Li FAMILYHybridization studies have shown that Li sequences are present throughoutmarsupials and placental mammalian orders (Burton et al., 1986) indicating that thecurrent Li elements are descended from a single family present in the ancestor to allmammals. The presence of structurally similar sequences in invertebrates and plants hassuggested a very ancient origin for the Li family. Examination of Li sequences within thehuman genome has identified several Li subfamilies based on shared sets of diagnosticchanges (Jurka, i989; Scott et al., 1987; Skowronski and Singer, i986). Degrees ofsequence divergence within the subfamilies has suggested that the subfamilies are ofdifferent evolutionary ages within the primate lineage (Jurka, 1989). Members of theyoungest Li subfamily appear to be the most readily transcribed (Jurka, 1989; Skowronskiand Singer, 1986). These elements also appear to be transpositionally active in that the denovo Li insertions into the factor VIII gene reported in 2 unrelated cases of hemophilia A(Kazazian et al., 1988) represent members of the youngest Li subfamily (Jürka, 1989).The ‘master gene’ model proposed for the evolution of the Mu family, has also beenproposed to explain Li evolution (Deininger et al., 1992). However, under this model, themaster gene(s) would produce only inactive copies. While this may be true for Mu elements,the isolation of the progenitor element for the factor VIII gene Li insertion has shown thatthis progenitor is itself flanked by a target site duplication and therefore is itself theproduct of a retrotransposition event. These data favour a self-propagation model in whicha number of active elements can produce both active and inactive progeny (Dombroski etal., 1991). For a particular active element to give rise to a subfamily would require theelement to be transcribed in the germ line over an evolutionary significant time span and tomaintain its ability to encode a functional reverse transcriptase. The observation that theLi that inserted into the factor VIII gene was derived from a functional Li progenitorsuggests that Li encoded proteins may preferentially act in cis (Dombroski et al., i991). Asecond aspect of Li evolution comes from the observation that members of the Li family34within a species are more similar to each other than are Li elements from different species,a phenomenon referred to as concerted evolution (reviewed in Hutchison et al., i989).Two different explanations for concerted evolution of Li have been proposed: (1)Most Li sequences within a species may have derived from recent proliferation of a smallnumber (1-3) of sequences known as molecular drivers or (2) Preexisting Li elements mayhave undergone gene conversion from one or a few master copies. The rapid transpositionmodel predicts that most Li elements occupy unique species-specific sites. The geneconversion model predicts that most sites of Li insertion in a particular species would alsobe occupied by Li in closely related species. One study examining orthologous sites withinthe f3-globin gene cluster in different species of mice showed that the majority of sitespossessing an Li insertion in one species was vacant in the other, supporting the rapidtransposition model (Casavant et al., 1988). In addition, Weiner et aL (1986) has suggestedthat a high copy number of Li elements before the mammalian radiation would haveserved to “fix” the consensus sequence. However, if only a few copies of Li progenitors werepresent at the time of the mammalian radiation 65 MYr ago, then subsequent mutationsin these progenitor sequences followed by expansion would create the observed species-specific variants. Gene conservation may have played a secondary role but gene conversionsare considered to be rare in mammals (Weiner et al., 1986).3. LTR-CONTAINING RETROELEMENTSRetrotransposons are closely related to retroviruses in overall structure, in codingcapacity, and in their mechanism of replication. This suggests a common evolutionaryorigin. However, the question has always been: ‘which came first”? Temin (1980) proposedthat a cellular reverse transcriptase was initially incorporated into a simple endogenouselement. Incorporation of genes for particle formation (ie. gag genes) may have beenrequired to prevent reverse transcription of cellular RNAS while increasing the efficiency ofretroelement reverse transcription. Some elements became infectious retroviruses byacquiring an envelope gene. An alternate view is that retrotransposons represent35degenerate retroviral proviruses that became fixed in the genorne during a germ lineinfection. However, this does not explain their ultimate origin, ie. where did theretroviruses come from?--A phylogenetic history of reverse transcriptase-containing retroelements has beenconstructed based upon conserved sequences within the reverse transcriptase (Xiong andEickbush, 1990) and is summarized in Figure 1-4. The non-LTR retrotransposons areshown to be the oldest group of retroelements and is supported by their wider distributionthan any other class of retroelements. Acquisition of LTR sequences and tRNA primingmay have allowed retrotransposons to overcome the apparent shortcomings in LINEreplication, that of incomplete reverse transcription. The LTR retrotransposons can bedivided into two groups based upon the order of the functional domains within the poi gene.The copia-Tyl group, which appears to be evolutionarily older, has the integrase domainlocated upstream of the reverse transcriptase domain, a rearrangement unique to thisbranch. The gypsy group has the integrase domain located downstream of the reversetranscriptase, an arrangement also found in retroviruses and the non-LTRretrotransposons. It is interesting that though the integrase/endonuclease activity is asfundamental to retrotransposition as the reverse transcriptase activity, it has not beenpossible, to make any alignment between the endonuclease domains of retroviruses andLINEs. This implies that the two different lineages acquired their endonucleasesindependently (Doolittle and Feng, 1992). It is also interesting that a sequence alignmentof RNase H domain detects relationships between retroelements that are not congruentwith the reverse transcriptase phylogeny (McClure, 1991). McClure (1991) has suggestedthat xenologous recombination (ie. the replacement of a homologous resident gene by ahomologous foreign gene) and/or independent gene assortment may have played a role inthe evolution of retroelements. As the final step in retroelement phylogeny, the acquisitionof an envelope domain allowed LTR retrotransposons to transpose between cells. Thatretroviruses represent the youngest group of retroelements is supported by their morelimited distribution, being found only in vertebrate animals.36j gag RT INprecursor retrotransposongag RT IN IAAAnon - LT RretrotransposonsIiØgag RT IN iipLTR retrotransposonsgypsy groupirii44 gag RT I N env r4ir.gag IINI*Igag IN j RT IRiLTR retrotransposonscopia-Tyl groupretrovirusesFigure 1-4. A scheme for the origin of retroelements (modified from Xiong and Eickbush,1990).37The origin of the reverse transcriptase is unclear. A simple hypothesis would be thatit derives from a DNA polymerase that acquired the ability to copy RNA as well as DNAtemplates (Whitcomb and Hughes, 1992). However, a cellular reverse transcriptase activityhas recently been discovered in association with telomerases (Shippen-Lentz andBlackburn, 1990). This discovery has greatly strengthened the idea that retroelementsoriginally evolved from cellular genes.3.1 HERVsThe various HERV families appear to have originated as retroviral germlineinfections in an early primate ancestor. For the HERV families which have been examinedfor their evolutionary distributions, all have been found in both apes and Old Worldmonkeys, indicating these retroviruses entered the primate lineage before this divergence30 MYr ago. HuERS-P3 related sequences have also been detected in New World monkeyssuggesting that this family has been present in the primate lineage for at least 45 MYr.Most HERV families have undergone limited expansion within the primate genome, withcopy numbers ranging from 1 to only 100 copies per human genome. The exceptions are theRTVL-H family which is present in 1000 proviral copies plus a similar number of solitaryLTRs (Mager and Henthorn, 1984), and the HERV-K family which, while having only —50proviral copies, consists of —25,000 solitary LTRs (Leib-Mosch et al., in press). Since allHERV families analyzed to date have existed in the primate genome for at least 30 MYr,Leib-Mosch et al. (in press) have concluded that the copy number of retroviral elements inthe human genome is a direct measure for their transposition activity. Thus, RTVL-H andHERV-K may represent the most transpositionally active families during primateevolution. However, all HERV elements examined to date, including those of the RTVL-Hand HERV-K families have been shown to be disrupted by mutations that render theelements incapable of encoding for infectious virions. Thus it appears that in most if not allcases, once a retroviral element became fixed in the genome, its sequence began to drift,accumulating random point mutations, deletions and rearrangements.38The presence of solitary LTRs has been reported not only for RTVL-H and HERV-Kbut for other HERV families as well including HERV-E (Steele et al., 1984) and RTVL-I(Armour et aL, 1989; Suzuki et al., 1990). Solitary LTRs most likely arise through an LTRLTR recombination event that deletes the internal sequences of that element and appearsto be a natural part of HERV evolution.Sequences from one HERV family, HERV-K, have evolved into a SINE family,SINE-R (Ono et al., 1987a). These elements are —630 bp in length and are composed of ashort stretch of HERV-K 3’ internal sequence followed by the LTR which contains a 370 bpdeletion and terminates in an adenosine-rich tract immediately downstream of the LTRpolyadenylation signal. The 5’ end of these elements was not derived from HERV-Ksequences and consists of 3-4 copies of a 40 bp GC-rich repeat (Ono et al. 1987a). Theserepeats are thought to represent an internal RNA pol III promoter though distincttranscripts from these elements have not been detected. SINE-R sequences are present inthe human genome in 4000-5000 copies (Ono et al., 1987a).3.2 THE-i elementsThe THE-i family of elements is unique among LTR-containing retroelementfamilies in that THE-i elements do not contain any regions of homology to known retroviralgenes. It is therefore not surprising that this family appears to have had an evolutionaryhistory quite distinct from that of the HERV families. THE-i LTRs are very similar toanother repeat family present in the human genome, MstII repeats (Mermer et al., 1987)and are considered subfamilies of a single repeat family (Fields et al., 1992). Recently thisTHE-lJMstII repeat family has been reported to be part of a larger superfamily ofmammalian LTR-retrotransposons (Smit, i993) named ‘Mammalian apparent LTRretrotransposons’ (MaLRs), that also includes the human MERi5 and MER18 families andseveral rodent families including MT and ORR-1. Several ORR-i and MT repeats flanksequences that resemble the internal sequence of THE-i but generally the internalsequence has been excised leaving only an LTR-like sequence. Evidence suggests that the39first MaLRs arose before the mammalian radiation 80-100 MYr ago (Smit, 1993). VariousMaLR families and subfamilies have subsequently arisen in different mammalian lineages.THE. 1/MstIl repeats represent a primate specific MaLRfamily, The prosimiap. galagocontains within its genome only a few THE-i related sequences (Schmid et aL, 1990)indicating that the amplification to high copy number occurred in the primate lineage afterthe divergence from prosimians. (Schmid et al., 1990).4. EVOLUTION OF PRIMATESBoth molecular and morphological evidence has been used to reconstruct primatephylogeny. Classically, morphological characteristics have been used to determine themajor primate branches and fossil records have been used to estimate the evolutionarytimes of the branchpoints (Gingerich, 1984). To summarize, the catarrhines or Old Worldprimate lineage diverged from the platyrrhines or New World monkeys 45 MYr ago.Approximately 30 MYr ago, the Hominoidea or apes diverged from the Cercopithecoidea orOld World monkeys. The branching pattern within the hominoids could not be definitivelyreconstructed from morphological evidence. While there is agreement that the gibbonsdiverged first from other hominoids, disagreements exist about the subsequent branchingevents. Molecular evidence, from protein studies such as immunological comparisons ofserum proteins to DNA-DNA hybridizations and direct orthologous gene sequencecomparisons that estimate degrees of DNA sequence divergence among primate lineages(reviewed in Goodman et al. 1989), have helped reconstruct hominoid phylogeny. However,the “molecular clock” is still calibrated using a divergence time estimated from fossilrecords (Sakoyama et al., 1987; Sibley and Ahiquist, 1987). In summary, withinHominoidea, the first branching separates hylobatines (gibbons) from the PonginaeHomininea stem, and the second branching separates Ponginae (orangutans) from thecommon ancestor of Homo, Pan and Gorilla (i.e. the hominids). Branching order within thehominids is still under debate though molecular studies that do resolve the branchingpattern clearly favour two dichotomous branching events, the first separating Gorilla from40CHIMPANZEE E[UMANFigure 1-5. Primate phylogeny. The approximate time of each divergence is indicated inMYr.6 MYrGORILT3”ORANGUTAN HOMININAENEW WORLDMONKEYSIPLATYRRIIIMPROSIM1ANSHOMINTDAE2Q MYrOLD WORLDMONKEYS HOMINOIDEACATARRIJINI45 MYrSIMIANSPRIMATEANCESTOR41the Homo-Pan stem and the second separating Homo (humans) and Pan (chimpanzees).Anthropologists disagree, believing that morphological evidence supports a Gorilla-Panlinkage that excludes Homo. However, fossil evidence has recently been discovered thatdoesappear to support the molecular evidence (Begun, 1992). A summary of primate phylogenyis shown in Figure 1-5.B. RETROTRANSPOSITION1. MECHANISMS OF RETROTRANSPOSITIONRetrotransposition is the term used to describe a replicative process involving anRNA intermediate. There are three basic steps: (1) transcription; (2) reverse transcription;(3) integration. The first requirement is for the retroelement to be transcribed. The RNAtranscript is then copied into DNA by the activity of a reverse transcriptase which usuallyoccurs within a cytoplasmic particle. The eDNA copy is then integrated into the genome atthe site of a staggered break. Repair of this break upon integration results in theduplication of the target site, a hallmark feature of a retrotransposed element. Allretroelements undergo these basic steps during retrotransposition though the mechanisticdetails vary between different retroelement families. Most details are only hypothesizedbased on structural and evolutionary data. However, for several retroelement families,transposition via an RNA intermediate has been experimentally demonstrated (Ty - Boekeet al., 1985; Curcio and Garfinkel, 1991; lAP - Heidmann and Heidmann, 1991; LINEs -Jensen and Heidmann, 1991).1.1 RetrotransposonsRetrotransposons are retroelements which strongly resemble retroviruses in genomeorganization and in the mechanism of replication but lack env genes and thus do not havean extracellular phase in their life cycle. The life cycles of retrotransposons and retroviruses42LTRRetrotransposonsIntracellularparticleLTRTranscription, RNA processingTranslationAssemblyReverse transcriptionRetrovirusesBudding and releaseExtracellular\\TTUSBinding to receptor and entry (infection)IntegrationFigure 1-6. Comparison of the life cycles of retrotransposons and retroviruses. Althoughthe general replicative schemes are similar, expression of env proteins (shownas spiked ovals) allows retroviral virions to recognize and infect new cells (fromWilkinson et al., in press).Integrated element(proviral form)/43are compared in Figure 1-6. Much work has been done to determine the molecular details ofthe transcription, reverse transcription, and integration of retroviruses (reviewed in Luciwand Leung, 1992). Here I will briefly review what is known about these steps in theretrovirus life cycle and the evidence supporting similar mechanisms for retrotransposons.1.1.1 TranscriptionRetroviral LTRs contain the cis-acting sequence elements that control the initiationand polyadenylation of viral transcripts (Fig. 1-7; Temin, 1982). Signals governingtranscription initiation are located in the U3 region of the LTR. A prototypical LTR U3region contains a TATA box and an upstream element, the CCAAT box, that togetherconstitute the basal promoter, plus additional upstream enhancer repeat elements thatbind specific cellular factors that enhance or activate transcription from the basal promoter.Transcription is initiated 20-30 bp downstream of the TATA box. This cap site defines theboundary between the U3 and R regions of the LTR. Within the R region is thepolyadenylation signal AATAAA. The site of polyA addition, at a . . .CA —20 bp downstreamof the polyadenylation signal, defines the boundary between the R and U5 regions of theLTR. Although both 5’ and 3’ LTRs contain sequences for transcription initiation andpolyadenylation, transcription initiation begins in the 5’ LTR and ends in the 3’ LTR,yielding a full length viral RNA.Sequence analyses of LTRs of retrotransposons have identified similar regulatorysequences in analogous positions to those of retroviral LTRs. For example, in the LTRs ofcopia-like elements, CCAAT and TATA boxes and AATAAA polyadenylation signals can allbe identified and are located in the appropriate places to justify the structural andfunctional division of the copia LTR into the typical U3, R and U5 regions of retroviralLTRs (Echalier, 1989). These observations are also true for TAP elements (Kuff andLueders, 1988), Ty elements (Garfinkel, 1992) and the HERV families such as RTVL-H(Feuchter and Mager, 1990). Transient assays for promoter/enhancer activity havedemonstrated promoter activity for retrotransposon LTRs. For example, LTRs from a copia44polyadenylationpromoter elements signalCCAAT TATA AATAA\\,/ gag-pol-env111111111.—i— IIIU3 U5 U3 RtU5enhancers Isite of poly(A)additioncap site (+1)L.. . . . . a a a a.. a ... . . • . a. . .. . . A Avira’ RNAFigure 1-7. Transcriptional regulatory elements in the retroviral LTR. See text for details.45like element 1731 (Echalier, 1989), TAP elements (Lueders et al., 1984; Horowitz et al.,1984) and RTVL-H elements (Fuechter and Mager, 1990) have all been shown to promotethe activity of the chioramphenicol acetyltransferase (CAT) gene. Finally, thecharacterization of unit length transcripts of different retrotransposons have shown thattranscription initiation and polyadenylation occur at the sites expected for regulation byLTR sequences (Echalier, 1989; Kuff and Lueders, 1988; Garfinkel, 1992; Wilkinson et al.,1990).1.1.2. Reverse transcriptionThe current model for retroviral DNA synthesis by reverse transcription is shown inFigure 1-8 and has recently been reviewed (Luciw and Leung, 1992). The entire reversetranscription process takes place in a nucleoprotein complex in the cell cytoplasm. Briefly,the following steps occur (Fig. 1-8): (1) Viral DNA synthesis is initiated by reversetranscriptase from the host-cell tRNA primer; the 3’ end of the tRNA molecule iscomplementary to the minus-strand primer binding site (PBS) located near the 5’ end of theRNA genome. Reverse transcriptase elongates from the 3’ end of the tRNA molecule to the5’ end of the viral RNA to produce minus-strand strong-stop DNA. (2) RNaseH (a functionaldomain in the reverse transcriptase protein) degrades part or all of the R from the 5’ end ofthe viral genome. R’ in the minus-strand strong-stop DNA is now single-stranded andavailable for base pairing. (3) The exposed R’ hybridizes with the complementary Rsequence at the 3’ end of the same or the second viral RNA molecule. This is the first of twostrand transfers or “jumps”. (4) Reverse transcriptase elongates from the 3’ end of theminus-strand strong-stop DNA and copies the viral genome, including part of the minusstrand PBS. (5) RNaseH degrades viral RNA in the DNA:RNA hybrid at the border withthe 3’ U3 sequence and generates an oligonucleotide primer that binds to a polypurine tract(PPT) near the 3’ end of the viral genome to prime plus-strand synthesis. From the PPT,reverse transcriptase copies the minus-strand strong-stop DNA through the tRNA sequencecomplementary to the PBS, giving the plus-strand strong-stop DNA. (6) The tRNA primer46RNAViral genome - gag pif envRNA) RIU5. /U3IRStep 11 ElongationStrong-stop DNA-DNA) \\R1U51_RU5I\\Step 2 RNase-H removes RR1U51PB*”Step 3 Intermolecular iump viacornplementaritV at RRjU5JR5rg )‘B+’PB- PB.’Step 4 Elongation of-DNA until partof PB- is copiedU3IRLIJ5jU3 iRU5l\\,//__ ySteps I. RNase-H degrades genomic RNA, thusleaving an RNA primer at PB*2. Elongation of *DNA unbl modified basein lENA is encounteredLJ3IR1U5IPB.U5IStep 6 1. Removal of tRNA and primer at PB.2. Intramolecular lump viacomplementarttv at PBU3IRDElongation and strand displacementLinear double strand DNA3g5gagpolanIL3IRjU5F ,kJYIR1U5II I PB- PB. ILrR LTRFigure 1-8. Model for retroviral DNA synthesis by reverse transcription. This schematicshows only the nucleic acid species that participate in the stepwise process.See text for details. From Luciw and Leung, 1992.47in the hybrid is removed by RNaseH, thereby exposing the PBS sequence at the end of theplus-strand strong-stop DNA. The second strand transfer is an intramolecular (i.e.intrastrand) event in which the PBS in the plus-strand strong-stop DNA forms a duplexwith the complementary sequence at the other end of the minus strand DNA molecule.(7) Reverse transcriptase then elongates from the PBS in both directions. A second stranddisplacement event separates the duplex between U3, R, U5 and U3’, R’, U5’. Accordingly,the LTRs are generated as reverse transcriptase continues to elongate through thesedisplaced sequences until the ends of the templates are reached. The product of reversetranscription is a fully duplex, linear DNA molecule with LTRs and blunt termini. Thisform of viral DNA is in a cytoplasmic nucleoprotein complex that contains integrase (IN)and other virion proteins. Subsequently, the complex is transported into the nucleus forintegration of the linear viral DNA intermediate into the host-cell genome to produce theprovirus (see below).Evidence indicates that this model of reverse transcription also applies toretrotransposons. For example, virtually all retrotransposons contain sequences that havesome homology with retroviral reverse transcriptase. Rflase, protease and integrasehomologies can also be identified. For several retrotransposon families, including Ty(Garfinkel et al., 1991), copia (Shiba and Saigo, 1983) and TAP (reviewed in Kuff andLueders, 1988) an element-encoded reverse transcriptase has actually been identified.Second, tRNA PBSs and 3’ PPTs can be identified in retrotransposons. For example, copiahas a PBS for methionine tRNA (Kikuchi et al., 1986) as does Ty (Eibel et al., 1981). TAPhas a PBS for phenylalanine tRNA (Ono and Ohishi, 1983). Third, cytoplasmicnucleoprotein complexes or virus-like particles (VLPs), with associated reversetranscriptase activity and retrotransposon RNA, have been identified for Ty (Garfinkel etal., 1985), copia (reviewed in Echalier, 1989; Shiba and Saigo, 1983) and lAP (reviewed inKuff and Lueders, 1988). In each case, retrotransposon-encoded gag-like proteins make upthe VLP. Fourth, experiments on the mechanism of Ty transposition indicated thatcomplete LTRs are regenerated during the process and that sequence polymorphisms are48transferred from the 5’ end of the transcript to the newly synthesized 3’ LTR and from the3’ end of the transcript to the 5’ LTR (Boeke et al., 1985; Muller et al., 1991). Thesepatterns of inheritance are exactly what one would predict based on the model for retroviralreverse transcription described above. Finally, that retrotransposons do indeed transposevia an RNA intermediate has been demonstrated experimentally for Ty elements (Boeke etal., 1985: Curcio and Garfinkel, 1991), lAP elements (Heidmann and Heidmann, 1991) andfor defective retroviral proviruses (Tchenio and Heidmann, 1991). In each case it wasdemonstrated that new copies derived from an element marked with an artificial intronhad the intron sequences precisely removed.1.1.3 IntegrationIntegration is an obligate part of the retroviral life cycle and occurs by a preciselydefined mechanism mediated by a retrovirally encoded integrase. This integrase is derivedby proteolytic cleavage from the 3’ portion of the gag-pot polyprotein (Luciw and Leung,1992). Retroviruses have evolved a specific integration mechanism because highereukaryotic cells do not have an efficient mechanism for precisely inserting exogenous DNAinto the cell genome. Transfection and microinjection studies with cloned DNA moleculesreveal that cells utilize an illegitimate recombination mechanism for integrating foreignDNA, which undergoes random deletion and rearrangement inside the cell.The current model for integration of viral DNA into the host cell genome is shown inFigure 1-9 and has recently been reviewed (Luciw and Leung, 1992). Backgroundinformation for the model is drawn from in vivo (i.e. cell culture) as well as in vitro studiesof several different retroviruses (see Luciw and Leung, 1992 and references therein). Not allaspects of the integration pathway have been rigorously demonstrated for any oneretrovirus and many important issues remain to be elucidated. Briefly, the following stepsare carried out: (1) In the cytoplasmic nucleoprotein complex, the integrase (IN) recognizesand binds to the short inverted repeats at the ends of the double stranded linear viral DNA(i.e. the viral att sites). IN nicks the blunt ends of the linear viral DNA removing two bases49U3[step 2ajCellular 5’‘‘5’DNA 3’5U3RU5Short direct repeatsin cellular DNA‘‘ 1!Figure 1-9. Model for integration of viral DNA into the host-cell genome. This figureemphasizes the interactions of the ends of linear viral DNA molecules withhost-cell DNA during the integration process. See text for details. From Luciwand Leung, 1992.env Duplex Linear Viral DNA gagCATT AATGGTAA TTACU3RU5 U3RU5I Step1j Nicking of linear viral DNA byand removal of 3’ dinucleotidesgagDNA by INIstep 2b1‘4TG3 ACU3RU5Joining (i.e., strand transfer) by INgag paL U3 RU5.4CellF.removal of mismatchedI dinucleotidesStepenesiigaon________—I-fill-in of gaps1 ISI__1_? I I Igag paL U3RU5Integrated Provirus50(TT) and generating recessed 3’ ends. (2) After transport of the nucleoprotein complex to thenucleus, the nucleoprotein complex attaches to a site in target cell DNA. IN probablymediates selection of the target site and numerous preferred sites have been identifiedwithin the cell genome; however, structural features of chromatin may influence siteselection. Subsequently, IN makes a staggered cut in cell DNA and catalyzes single strandjoining of the recessed dinucleotide CA 3 ends of viral DNA to the cell DNA 5’ overhangs.These cleavage and joining reactions probably take place in a concerted fashion and requireno added energy source. (3) Finally, host cell enzymes remove the two mismatched (AA)bases at each 5’ terminus of viral DNA, fill in the single stranded gaps in the target DNA,and ligate the remaining ends of viral and target DNA. The result is an integrated provirusflanked by a short target site duplication. These short direct repeats are 4-6 bp in lengthdepending upon the retrovirus. Since the size of the direct repeat is characteristic for theinfecting virus and does not depend on the cell type, it is IN that controls the size of thestaggered cut.Studies on proviruses in infected cells show that many sites in the cell genome areavailable for integration and that very little sequence preference is exhibited. However,integration is not completely random as integration into certain regions can occur more (eg.Isfort et al., 1992) or less (eg. King et al., 1985) frequently than expected if events wereentirely random. Also, integration hot spots have been reported (Shih et al., 1988; Craigie,1992; Ji et al., 1993). The chromatin topology of target cell DNA may influence retroviralintegration. For example, proviruses are found more frequently in actively transcribedregions and in DNaseI hypersensitive sites than in regions of closed chromatin (Vijaya etal., 1986; Rohdewohld et al., 1987; Brown, 1990; Scherdin et al., 1990; Luciw and Leung,1992). Pryciak and Varmus (1992) reported that integration into naked DNA targets isnonuniform, implying a nucleotide sequence basis. Also, chromatin assembly enhances thereactivity of many sites, so that integration occurs more frequently at sites in nucleosomal,rather than nucleosome-free, regions of minichromosomes. In contrast, integration wasfound to be prevented in a region occupied by a site-specific DNA-binding protein. Finally,51in chromatin, integration occurs preferentially at positions where the major groove is on theexposed face of the nucleosome DNA helix (Pryciak and Varmus, 1992).Transposition by retrotransposons also results in the generation of a target siteduplication, the length of which is characteristic of the particular retrotransposon family.For example, integration of copia is associated with a 5 bp target site duplication(Dunsmuir et al., 1980) as is Ty (Boeke et al., 1985). TAP integration is associated with a 6bp duplication (Heidmann and Heidmann, 1991). This suggests a similar integrationmechanism mediated by a retrotransposon-encoded integrase activity. Sequence homologyto retroviral integrases has been identified in the pol regions of each of theseretrotransposons (reviewed in Echalier, 1989; Garfinkel, 1992; Kuff and Lueders, 1988).Associated integrase activities have also been identified.The main distinction between integrated retroviral proviruses and retrotransposonsis that the latter do not have an extracellular stage in their life cycle, ie. they are notinfectious. This has often been attributed to the observation that most retrotransposons donot possess an enu domain. While this is true for Ty and copia elements, otherretrotransposon families do contain elements that have a 3’ ORF which corresponds in sizeand location to retroviral env genes. For the majority of LAP elements and the variousHERV families, these 3’ ORFs contain numerous mutations that render them nonfunctional(reviewed in Kuff and Lueders, 1988; Larsson et al., 1989). However, it should be notedthat the mouse does contain -200 copies of an element termed IAPE (lAP envelope) thatcontains a retroviral-like ORF (Reuss and Schaller, 1991). For other families such as gypsy,17.6 and 297 in Drosophila, the 3’ ORF has never revealed any homology to the sequencesthat are conserved between the env proteins of retroviruses (Echalier, 1989). In fact, it hasbeen suggested that the third ORF might have originated in a nonretroviral sequence(Inouye et al., 1986).521.1.4 Cis-acting sequencesRetrotransposition requires a number of protein functions, including a reversetranscriptase (with associated RNaseH activity) and an integrase. However, the majority ofretrotransposons that have been sequenced to date have been shown to be translationallydefective. It is thought that the protein functions required for the retrotransposition ofthese defective elements can be supplied in trans. Trans complementation is the basis forretroviral gene transfer technologies. However, there are cis-acting elements that functionin retroviral gene expression, reverse transcription and virion assembly (Fig. 1-7 and 1-10;reviewed in Luciw and Leung, 1992). These must be retained by the defective element for itto remain transpositionally competent.The retrotransposon must be expressed and therefore must retain functionaltranscriptional regulatory signals in the LTRs (as discussed above; Fig 1-7). A number ofcis-acting elements are required for reverse transcription, including the 5’ PBS and 3’ PPT,the 5’ U5 and leader (L) regions, the 5’ packaging signal (P), and the 5’ and 3’ R regions(Fig. 1-10). Near the 5’ end of the genome is the minus-strand PBS that is complementaryto 16-19 bases at the 3’ end of the specific tRNA molecule that serves as the primer for firststrand synthesis. However, initiation of reverse transcription also requires the interactionsof U5 sequences with (1) the 5’ untranslated L sequence that precedes the initiation codonfor gag (Cobrinik et al., 1988), (2) the inverted repeats proximal to the PBS (Cobrinik et al.,1991), and (3) the T’PC loop sequence in the tRNA primer (Aiyar et al., 1992). The resultantstem-loop arrangements are required for efficient initiation of reverse transcription. Theshort repeated sequence, designated R and which is located at each end of the viral genome(see Fig. 1-10), is required for the first strand transfer during reverse transcription. A shortPPT near the 3’ end of the viral genome is required as the primer binding site for plusstrand DNA synthesis. Finally, for retroviruses, all steps in reverse transcription take placein a nucleoprotein complex in the cell cytoplasm. Retrotransposons, although they remainintracellular, still apparently require packaging within such a complex forretrotransposition (reviewed in Echalier, 1989; Garfinkel, 1992; Kuff and Lueders, 1988).53A.VirionRNAgenomeSD5.RU5gagenvLU3RB.ViralDNAgenome(provirus)3’I’ll’U3IcAIHHHU3RU5Figure1-10.Cis-actingelementsingenomicviralRNA.(ARetroviruses have cis-acting signals, or packaging signals (‘I’), as well as trans-acting virioncore proteins that result in efficient encapsidation of genomic RNA into virions (Linial andMiller, 1990).The ‘P sequences in viral RNA have been localized through mutational analyses ofviral genomes. Some differences with respect to locations of ‘P elements in genomic RNAare observed in comparisons of avian sarcoma-leukemia virus (ASLV) with MLV andreticuloendotheljosis virus (REV) (reviewed in Linial and Miller, 1990). One main differenceis the position of ‘P with respect to the viral splice donor (SD) site. For both MLV and REV,‘P is a 300 bp sequence that is located in the L region immediately downstream from theSD site. Subgenomic env mRNAs for these viruses are not packaged because ‘P is in the envintron. The analogous ‘P signal in ASLV is 150 bp long and is located upstream of the SDsite. This ‘P sequence would thus be present in subgenomic env mRNAs; yet env mRNAs arenot efficiently packaged. Other important packaging element(s) not present in thesesubgenomic env transcripts must also exist. In fact, sequences in the 5’ end ofgag havebeen shown to lead to an increased packaging efficiency for both ASLV and MLV (Bender etal., 1987; Armentano et al., 1987; Adam and Miller, 1988). This has been referred to as theextended packaging signal, ‘P÷ (Bender et al., 1987). A second difference between ASLVand MLV is that efficient assembly in ASLV also requires a segment 100 bases long thatforms a direct repeat flanking the src gene. An analogous 3’ element in MLV has not beenidentified.Alignment of packaging regions of REV, MLV and VL3O (an endogenous retrovirusrelated sequence in mouse) revealed only a short stretch (12 bp) of homology, suggestingthat there is no specific sequence that can be designated the “P sequence”. Potential RNAsecondary structures in MLV ‘P and ASLV ‘P+ have been identified, but additionalmutational studies are required to associate their importance in packaging.Xu and Boeke (1990) have used deletion mutants of Ty to determine the sequencesrequired in cis for transpositional competence and have found, as with retroviruses, allrequired sequences are located within or near the LTRs. The smallest mini Tyl element55capable of transposition contained the 3’ LTR and the transcribed portion of the 5’ LTR, 285bp of internal sequence 3’ to the 5’ LTR (putative 5’ PBS and packaging signal) and 23 bpof internal sequence 5’ to the 3’ LTR (putative 3’ PPT).Finally, integration of the linear double stranded viral DNA intermediate requiresthe presence of the viral att sites, the short inverted repeats that bound the LTRs and thatare recognized by the retroviral integrase (see Fig. 1-7). Mutation of these viral att sitesdrastically reduces the efficiency of integration (Luciw and Leung, 1992). All retroviralLTRs are bounded by 5’ TG CA 3’. This has also been observed for the LTRs ofretrotransposons such as Ty, copia and TAP (reviewed in Garfinkel, 1992; Echalier, 1989;Kuff and Lueders, 1988). There are exceptions, such as a distinct group of Drosophilaretrotransposons that includes 17.6, 297 and gypsy, and which, in contrast to otherretrotransposons, displays a specificity to their insertion sites. These elements are boundedby 5’ AGT AC 3’ and it is assumed to be correlated with the DNA sequence specificity oftheir target (Echalier, 1989).1.1.5 Retrotransposition is mutagenicElements undergoing retrotransposition experience a high rate of mutation. Onereason is that reverse transcriptase lacks a 3’-to-5’ exonuclease proofreading mechanism(Whitcomb and Hughes, 1992). Measurements on the fidelity of purified preparations ofreverse transcriptase have been made by determining the frequency of nucleotide (nt)misincorporations in reactions utilizing defined RNA or DNA templates (see Williams andLoeb, 1992; Coffin, 1992). Examples of error rates obtained for reverse transcriptases are 1misincorporated nt per 9000-17,000 nt for ASLV, 1 per 30,000 nt for MLV, and 1 per 1700to 1 per 4000 nt for HIV. For comparison, for E. coli DNA polymerase, an enzyme having aproofreading capacity, the error rate was 1 per 100,000 nt. The error rates did varyhowever, depending upon experimental conditions and the types of templates and reversetranscriptase preparations used.56More useful values for mutation rates come from direct measurements during viralinfection of cells. Error rates of— i0 per base per replication cycle have been reported foravian viruses on passage through chicken cells, and —i0 for MLV. This contrasts to thespontaneous mutation rate in eukaryotic cells of 10 - 10-10 mutations per nt per celldivision. For example, the mutation rate for the v-mos gene of Moloney murine sarcomavirus (Mo-MSV) has been estimated at 1.31x10 nt substitutions per site per year, whilethat of the c-mos is 1.71x109,a million-fold difference in mutagenic rates betweenretroviral and mammalian genomes (Gojobori and Yokoyama, 1985). There is evidence thatretrotransposons also show a higher mutation rate than cellular DNA. Aota et al. (1987)reported the mutation rate for TAP elements to be 6-10x109nt per site per year, comparedto the rate of 1x109 estimated for cellular genes. This observed mutation rate forendogenous TAP elements, though higher than the mutation rate for cellular DNA, is stillno where near that for replicating retroviral genomes.In addition to misincorporation errors, retroviruses are also subject to a highfrequency of errors due to sequence rearrangements such as insertions, deletions andreduplications (Coffin, 1992). Certain regions are particularly prone to suchrearrangements because of the presence of flanking short repeated sequences that canmediate homologous recombination events. Retroviruses also exhibit extraordinary highrates of recombination (Coffin, 1992). Recombination occurs during the reversetranscription process and is a consequence of the dimeric viral genome and the ability ofreverse transcriptase to switch templates during reverse transcription (Hu and Temin,1990).The mutation rate of retroviruses is so great that retroviruses evolve as a“quasispecies”, ie. a distribution of mutants rather than unique genomes that identifyclonal populations (Williams and Loeb, 1992). The process of retrotransposition has alsobeen shown to be mutagenic for retrotransposons. For example, comparisons of marked Tyelements before and after movement showed a loss of certain restriction sites (Boeke et al.,1985). Comparison of the LTRs of transposed TAP copies with those of the parental TAP57element revealed several types of sequence changes including conversion to sequencesspecific to endogenous TAP elements (Heidmann and Heidmann, 1991). This latterobservation suggested that, as with retroviruses, two LAP RNAs are packaged within eachcytoplasmic particle and recombination can occur between the two genomes, in this casebetween a experimentally marked TAP and an endogenous TAP.1.1.6 Rates of retrotranspositionRates of retrotransposition for several retrotransposons have been estimated inexperimental systems (Boeke et al., 1985; Curcio and Garfinkel, 1991; Heidmann andHeidmann, 1991; Tchenio and Heidmann, 1991, 1992). When induced to high levels ofexpression by linkage to the GALl promoter, a marked Ty element has been shown totranspose at a rate of between 3x107 and 1x105 transpositions per Ty element pergeneration (Curcio and Garfinkel, 1991). Transpositions were not detected if expression ofthe marked Ty element was not induced, suggesting that transposition frequency isdependent upon the level of expression of the retrotransposon. The rate of transposition of amarked LAP element was estimated to be -i event per cell per generation (Heidmannand Heidmann, 1991). For a marked defective retrovirus, the frequency of transpositionwas estimated at 10-6 (Tchenio and Heidmann, 1991). A second report of transpositionfrequency of a marked defective retrovirus gave estimates of i0 to 10-6 and wasdependent upon the level of expression of the gag-pol gene of the helper virus (Tchenio andHeidmann, 1991). A similar experiment done in the absence of exogenously added viralgenes detected transposition of the reporter gene at a frequency of i08 to io6 events percell per generation, which turned out to represent the frequency of pseudogene formationby the reporter gene (Tchenio et al., 1993).1.2 LINEs - Li elementsLi elements have several structural features that suggest they transpose via anRNA intermediate (reviewed in Hutchison et al. 1989). First, Li elements are flanked by58short direct repeats that presumably represent a target site duplication. Second, Lielements possess a 3’ A-rich tail which is preceded by a polyadenylation signal AATAAA.This suggests that Li elements are reverse transcribed copies of a polyA+ mRNA. Third,the conserved ORF-2 contains amino acid (aa) sequence motifs characteristic of reversetranscriptases (Hattori et aL, 1986) and recently, a functional Li element has been isolatedthat does encode a reverse transcriptase activity (Mathias et al. 1991). This has led to amodel for Li retrotransposition that involves: (1) synthesis of full length, polyadenylatedLi transcripts; (2) reverse transcription of the RNA by an Li-encoded enzyme; and (3)insertion into staggered chromosome breaks. 5 truncated elements would result fromincomplete reverse transcription. That 5’ truncation commonly occurs during the process oftransposition and does not simply represent degeneration during the course of evolution isfurther illustrated by the fact that two new Li insertions are 5’ truncated elements. ThatLi and Li-related elements do indeed undergo retrotransposition has been experimentallydemonstrated for the LINE-related I element in Drosophila (Jensen and Heidmann, 1991;Pelisson et al., 1991) and a mouse Li (Evans and Palmiter, 1991) using marked elements:intron sequences placed within these elements are precisely removed during the generationof new copies, indicating an RNA intermediate. However, the mechanistic details are notknown.1.2.1 TranscriptionFull length discrete Li transcripts have been detected in polyA+ RNA from thehuman teratocarcinoma cell line NTera2Di (Skowronski and Singer, 1985) and primerextension studies have aligned the 5’ end of the RNA with the consensus left end of genomicLi elements (Skowronski et al., 1988). Because a typical upstream RNA pol II promoterwould be lost during a cycle of transcription and reverse transcription, it has been proposedthat Li elements must possess an internal promoter (Skowronski et al., 1988). An internalRNA p01 II promoter activity has been identified in the 5’ untranslated region of Lielements (Minakami et al., 1992; Swergold, 1990). 5’ truncation would result in loss of this59internal promoter and render that element transcriptionally inactive. Even for full lengthelements, it appears that only a subset are actually transcribed. For example, it does notappear that elements containing the extra 132 bp in the 5’ UTS are transcribed (Scott etal., 1987). Also, elements most readily transcribed have a distinctive consensus sequence intheir 3’ UTS and are designated as subset T, for transcribed (Skowronski et al., 1988).1.2.2 Reverse transcription / integrationUntil recently, all Li elements that had been sequenced had been shown to bedisrupted by numerous deletions and point mutations. However, construction of an Liconsensus had predicted two ORFs, with ORF2 containing regions of homology to reversetranscriptases. Dombroski et al. (1991) have recently isolated a functional Li element, thelikely progenitor of the factor VIII gene insertion. This Li element contains two intactORFs. Functional studies with experimental constructs of the Li element indicate thatboth ORFs can produce protein products (Mathias et al., 1991; Holmes et al., 1992). ORF1encodes a 40 kD protein of uncertain function. No homology has been detected between thededuced structure of p40 and typical gag polypeptides (Holmes et al., 1992). The centralregion of p40 does contain sequences that can, in principle, form a leucine zipper (Holmes etal., 1992). ORF2 has been shown to encode a reverse transcriptase activity (Mathias et al.,1991). These findings support the model in which Li-encoded reverse transcriptase isresponsible for the reverse transcription of Li RNA. Defective Li elements may still betranspositionally competent if protein functions could be supplied in trans by functional Lielements. However, analysis of mouse Li (L1Md) sequences indicate that elements capableof transposition contain intact ORFs (Hardies et al., 1986) ie. that it is Li-encoded cisacting proteins that mediate Li transposition. The one functional Li element identified todate in humans was the progenitor element of one of the Li insertions into the factor VIIIgene (Kazazian et al., 1988) which also suggests that use of protein products in cis ispreferred in Li retrotransposition. This has also been demonstrated for Drosophila I factorretrotransposition (Pelisson et al., 1991; Jensen and Heidmann, 1991).60The primer(s) for Li cDNA synthesis has not been described. It has been proposedthat a 3’ end of the DNA nicked or broken at the insertion site serves as primer for the 3’end of Li RNA (Hutchison et aL, 1989). Such a priming event has been directly shown forthe non-LTR retrotransposon R2 from Bombyx mon (Luan et al., 1993) and the term ‘RNAmediated integration’ was suggested to refer to this method of retrotransposition (Luan etal., 1993). However, during a study to detect retrotransposition of a marked LiRn element,reverse transcribed copies of the marked Li were isolated as extrachromosomal elementssuggesting that reverse transcription of the Li element takes place prior to integration(Segal-Bendirdjian and Heidmann, 1991). It can not be ruled out therefore that Li cDNAsynthesis is primed by a small cellular RNA, as is the case for retroviral reversetranscription.One model for the reverse transcription of Li RNA is as follows (Hutchison et al.,1989): The Li protein products from ORF-1 and ORF-2 remain associated with the Litranscript after translation to form a nucleoprotein complex. This complex moves to thenucleus where ORF2, which has a region of homology to nucleic acid binding domains,mediates binding to DNA at the site of a nick or a break. Homology to the endonucleasedomains of retroviral pot genes has not been reported for either ORF1 or ORF2. Thereforeit appears that Li does not encode an integrase function but must rely on the presence of apre-existing nick or break. The variable length of the target site duplication (generallyranging from 5 to 15 bp) also suggests that the location of the breaks is not highly specificand therefore not the result of a specific Li encoded integrase/nuclease as is the case forretroviruses and viral retrotransposons. A nick or a break could provide a 3’ end from whichthe first strand of cDNA synthesis is primed. The other end of the broken target DNAassociates with the 5 end of the Li RNA either by ligation or base pairing. Reversetranscription continues and the newly synthesized DNA strand is ligated to the 5’ terminusof this broken end. Integration is completed by repair synthesis that produces the secondstrand of the Li cDNA, accompanied by degradation of the RNA template by RNaseH.61Similar models have been proposed for the retrotransposition of I elements in Drosophila(Finnegan, i989b) and Cin4 in Zea mays (Schwarz-Sommer et al., 1987).For retroviruses and presumably viral retrotransposons, reverse transcription occursin association with a nucleoprotein complex. The above model for Li retrotransposition alsoincorporates such an association for reverse transcriptionlintegration. In support of thisaspect of the Li model, Deragon et al. (1990) have demonstrated that full length Litranscripts and reverse transcriptase are associated together in cytoplasmic particles whichcould represent an intermediate in Li transposition. The major protein component of thismacromolecular complex is 37 kD. This is similar to the 40 kIJ protein encoded by Li ORF1(Holmes et al., 1992), which was reported to be the same as the endogenous 40 kD proteindetected in NTera2Dl cells, a protein which migrated in denaturing gels as if it were 38 kD(Leibold et al., 1990). Similar observations of ribonucleoprotein particles containing LiORF1 proteins and associated with Li RNA have been made in mouse embryonal cells(Martin, 1991).1.3 SINEs - Alu elementsLike Li elements, Alu repeats possess several structural features that stronglysuggest they retrotranspose (reviewed in Schmid and Maraia, 1992). They contain 3’ A-richtails and are flanked by direct repeats. Also, their precisely defined 5’ end corresponds tothe start site for RNA pol III. Thus it appears that Alu elements represent retrotransposedpol III transcripts.1.3.1 TranscriptionAlu elements contain an internal RNA p01•111 promoter, as determined by a match tothe consensus promoter sequence and confirmed in many cases by in vitro transcriptionassays using RNA p01 III (Deininger, 1989). The internal promoter directs transcription tostart at the 5’ end of the Alu element. Transcription by RNA p01 III proceeds through theelement and, because Alu elements do not contain a transcription termination signal,62continues into the 3’ flanking sequence. Termination occurs at a short poly(T) tract, theconsensus RNA pol III terminator (Bogenhagen et al., 1980), which is expected to occurfairly readily by chance. Hence, the primary Alu transcript contains the full element plus avariable length of flanking sequence that ends in a short poly(U) tract.Most Mu repeats appear to be transcriptionally inactive in vivo (Paulson andSchmid, 1986; Deininger, 1989). The exceptions are Alus comprising the youngest Alusubfamilies (Matera et al., 1990; Sinnett et al., 1992). Several explanations have been putforth to explain the transcriptional inactivity of Mu elements. One reason may be that mostAlus have retrotransposed into inactive chromatin domains (Schmid and Maraia, 1992). Asecond reason is the possible requirement for upstream activating sequences in addition tothe internal promoter. If Alu elements retain this dependence, then only rarely would aretrotransposed Mu be inserted near appropriate activating sequences (see section 1.1). Athird reason for transcriptional inactivity may be methylation (Liu and Schmid, 1993).Young Alu elements are very rich in CpG dinucleotides. Methylation of these CpGdinucleotides may serve to repress Mu transcription. Also, methylated CpG dinucleotidesshow a rapid transition to TpG (or equivalently CpA). This may be responsible for thepermanent silencing of older Mu elements which do contain fewer CpGs (Liu and Schmid,1993). There has also been reported developmental differences in the methylation of Alurepeats, with young Alu elements appearing to be almost completely unmethylated insperm DNA (Hellmann-Blumberg et al., 1993).1.3.2 Reverse transcriptionMu elements do not encode a protein function. Therefore, the reverse transcriptaseactivity must come from another source, either from infecting retroviruses, endogenousproviruses or other retroelements such as Li elements. For retroviruses and viralretrotransposons, reverse transcription occurs within a cytoplasmic nucleoprotein particle.It has been proposed that Mu elements retain a secondary structure that facilitatespackaging and thus retrotransposition. A complex secondary structure, first identified in637SL RNA has been conserved in both Alu and Bi elements, despite significant divergencein primary sequence (Sinnett et al., 1991; Labuda et aL, 1991).Reverse transcription also requires a primer and Alu transcripts may be capable ofself-priming their own reverse transcription. As discussed above, Alu transcripts haveadditional 3’ flanking sequence ending in a short polyU tract. These U’s may fold back onthe A-rich 3’ end of the Alu sequence and prime reverse transcription. This would result ina DNA copy of the Alu transcript with none of the flanking region. It has been observed forBi elements that the primary transcript can undergo 3’ processing that removes the polyAand oligoU tracts depending upon the sequence surrounding the termination signal (Maraiaet al. 1992). Under the above model, such 3’ processing would eliminate the possibility ofself-priming and drastically reduce the potential for reverse transcription. This couldrepresent another point of regulation for Alu retrotransposition (Schmid and Maraia, 1992).However, it can not be excluded that reverse transcription may be primed directly by thegenomic DNA at the target site of integration, as has been suggested for the priming of Lireverse transcription.1.3.3 IntegrationAlu elements are flanked by short direct repeats that vary in size from a few bp to>30 bp in length and are generally A rich (Deininger, 1989). The length variation in thedirect repeats suggests that, as with Li integration, Alu elements take advantage of preexisting nicks or breaks in the DNA for integration. Nicks and breaks can be generated bynonspecific cellular enzymes such as topoisomerases, by activities of DNA replication andrepair, or by chemicals and radiation. The observation that Alu integration sites tend to beA-rich and that the A-richness is predominantly 3’ of the element suggests that integrationis not random and that some interaction/pairing between the 3’ flanking A-rich region andthe A-rich 3’ end of the Mu promotes ligation and gap repair (Deininger, 1989).642. REGULATION OF RETROTRANSPOSITION BY THE HOST CELLControlling the level of retrotransposition is necessary for the genetic stability of thehost cell. Regulation of retrotransposition by host cell factors has long been suggested byseveral observations. One is the observed mobilization of specific transposable elements inassociation with the hybrid dysgenesis phenomenon in Drosophila. There are threeindependent systems of hybrid dysgenesis. Two involve the transposon families P and hobo.The third involves I elements, non LTR-containing LINE-related retrotransposons(Bucheton, 1990). This phenomenon apparently results from crossing fly stocks that differin their capacity to repress the transposition of certain transposable elements. A secondobservation suggesting regulation by host cell factors is the observation that retroelementsare differentially expressed in different cell types. For example, lAP elements in mice showtissue-specific and age-specific regulation of expression (Gaubatz et al., 1991). Copia-likeelements of Drosophila are also developmentally regulated, with different families of copialike elements being transcribed during different developmental stages (Echalier, 1989).Human retroelement families, including Alu, Li and HERVs, are also tightly regulated. Ingeneral, the expression of these retroelements has been found to be highest in one or moreof the following: placental tissue, germline cells, and transformed cells.2.1 Transcriptional regulationExpression is the first requirement for retrotransposition and it has beendemonstrated that inducing high levels of transcription results in increase rates oftransposition (Boeke et al., 1985). Transcriptional regulation by host cell factors has beenidentified for several retrotransposons (see Garfinkel, 1992; Echalier, 1989; Boeke andCorces, 1989). In Drosophila, for example, su(Hw) (suppressor ofHairy wing) and su(f3(suppressor of forked) encode DNA binding proteins that have been shown to bind tospecific sequences within the gypsy element and positively and negatively, respectively,regulate gypsy transcription (Parkhurst and Corces, 1985;1986). An example in yeast isSPT3, which encodes a factor required for proper initiation of transcription at the Ty LTR65promoter; spt3 mutants accumulate Ty transcripts initiated at other sites within theelement (Garfinkel, 1992). The transcription of murine endogenous MLV-related sequencesappears to be regulated by at least two trans-acting factors, Gv-1 and Gv-2 (Wilson et al.,1988). lAP transcription, often detected in primary mouse tumors and consistently high inneoplastic cells (reviewed in Kuff and Lueders, 1988), may be due to trans-activation byoncogene products in these transformed cells (Horowitz et al., 1984). Luria and Horowitz(1986) found that a cloned TAP LTR responded to the co-expression of all four nuclearoncogenes tested: SV4O large T antigen, adenovirus Ela, c-myc, p53. c-myc expression hasbeen widely linked to the proliferative state, and there is a good correlation between TAPexpression and elevated levels of c-myc transcripts in thymus, stimulated B cells andplasmacytomas (Mushinski et al., 1983; Marcu et al., 1983). RTVL-H LTRs have also beenshown to be upregulated by SV4O large T antigen (Feuchter et al., 1992). Finally the HERVelement ERV3 has been shown to be down regulated in choriocarcinoma cells, possibly dueto the loss of a specific host factor upon oncogenic transformation (Kato et al., 1988).Regulation of retroelement transcription by host cell factors is not surprising in that LTRshave been shown to contain classical upstream enhancer sequences.It should be noted that the observed localization of many retroelements to regions ofconstitutive heterochromatin may also limit their transcription and hence theirtransposition (Kuff, 1988). Germline fixation of retroelements to these genetically inactiveregions may be favored due to a reduced probability of insertional mutations.2.2 DNA methylationDNA methylation is known to suppress the transcriptional activity of cellular andviral genes (Bird, 1992; Cedar, 1988; Hsiao et al., 1986). Studies have shown thatproviruses, introduced into preimplantation mouse embryos, become heavily methylatedsoon after their integration and their expression becomes permanently suppressed (Jahneret al., 1982; Aboud et al., 1992). Such viruses can be activated by demethylating agentssuch as 5-azacytidine, indicating that hypermethylation is indeed involved in the66suppression of these elements (Jaenisch et al., 1985; Aboud et al., 1992). There is also astrong correlation between LAP methylation and transcriptional activity (reviewed in Kuffand Lueders, 1988). lAP expression in two tumor cell lines was correlated withdemethylation of the 5’ LTR (Feenstra et al., 1986). Also, in vitro methylation of the 5’ LTRabolished promoter activity towards a linked reporter gene (Feenstra et al., 1986).Methylation is also thought to be responsible, at least in part, for the transcriptionalinactivity of most Alu elements (Liu and Schmid, 1993).2.3 Post-transcriptional regulationIt does not appear that regulation of transposition occurs solely by transcriptionalrepression of retroelements since many retroelements are expressed at high levels incertain cell types with no observable transposition. One obvious factor restrictingtransposition is the genetic defects in the elements themselves. Virtually allretrotransposons so far examined have been shown to be defective, containing multiple stopmutations, deletions, and rearrangements. Regulation of transposition may result simplyfrom the accumulation of defective copies and the gradual loss of functional elements (Kuff,1988; Garfinkel, 1992). However, defective copies can still transpose by complementation intrans (Tchenio and Heidmann, 1991). Thus, while the majority of retroelements may betranslationally incompetent, these elements may still be capable of retrotransposition. Forexample, a recombinant vector strategy used to survey a large number of different genomicTy elements indicated that over 70% of the recombinant Ty elements were transposition-competent (unpublished results cited in Garfinkel, 1992). This suggests that, in addition totranscriptional regulation, there may also be post-transcriptional regulation ofretrotransposition. This may include regulation of the translational frameshifting needed inmany retroviruses and retrotransposons for the translation of the pot product from the gagpol transcript (Garfinkel, 1992). Also, regulation may occur post-translationally throughcontrol of the proteolytic processing of the primary translation products. A study by Curcioand Garfinkel (1992) suggested that Ty transpositional activity is limited by the67availability of Ty protease and possibly integrase. The lAP gag-pol precursor polypeptidealso appears to remain largely unprocessed. This, coupled with the intracisternallocalization of the lAP particles (ie. physically separated from the cytosolic precursorsrequired for DNA synthesis) may be responsible for the paucity of lAP transpositions (Kuff,1988). Finally, a defect in proteolytic processing may be responsible for the failure ofHERV-K encoded HTDV particles to become mature virions (Wilkinson et al., in press).3. ACTIVATION OF RETROTRANSPOSITION BY EXOGENOUS FACTORSAs noted in the preceding section, retrotransposition is dependent of endogenoushost cell factors that regulate transcription of retroelements and the translation andprocessing of retroelement-encoded proteins. However, external factors can also affectretroelement transcription and possibly retrotransposition.3.1 HormonesThe transcription of certain retroviral proviruses can be regulated by steroidhormones. For example, MMTV expression is enhanced by progesterone or glucocorticoidsand can be correlated to the presence, in the LTR of MMTV, of a hormone responsiveelement (HRE) (Hartig et al., 1993). A similar observation has been made for HERV-K (Onoet al., 1987b). The synthetic glucocorticoid dexamethasone has been shown to increase thetranscription of lAP elements (Emanoil-Ravier et al., 1988). The Drosophila copia-likeretrotransposons 1731 and 412 are repressed in the presence of the insect moultinghormone analogue ecdysterone (Peronnet et al., 1986). Mouse VL3O retrotransposons arespecifically expressed in steroidgenic cells in response to pituitary-derived trophic hormones(Schiff et al., 1991). Treatment of the teratocarcinoma cell line NTera2Di with retinoic acidinduces differentiation of the cells that subsequently results in the downregulation of Liexpression and the expression of HERVs ERV9 (La Mantia et al., 1991) and RTVL-H(Wilkinson, 1993). RRHERV-i is upregulated upon retinoic acid treatment of PA-iteratocarcinoma cells (Kannan et al., 1991). TAP elements also are transcriptionally68activated upon retinoic acid induced differentiation of mouse F9 teratocarcinoma cells(Howe and Overton, 1986). This differentiation-specific expression is regulated by the LAPproximal enhancer (IPE) element located proximal to the TAP basal promoter (Lamb et aL,1992). The IPE is inactive in undifferentiated F9 cells, repressed by endogenous factors.The IPE is active in differentiated cells, possibly via the binding of a 60 kDa positiveregulatory protein. Adenovirus Ela protein can repress TAP expression in differentiatedcells, with the repression again mapping to the IPE element (Lamb et al., 1992).3.2 Environmental stressesA number of environmental factors can induce the expression and transposition ofproviruses and retrotransposons. Two such factors are anoxia (Anderson et al., 1988) andtemperature. For example, heat shock in adult Drosophila induces a rapid and significantincrease in copia-specific transcripts (Strand and McDonald, 1985). In contrast, lowtemperatures stimulate transposition of yeast Tyl elements (Paquin and Williamson, 1984)At higher temperatures, the transposition rate decreases until it is virtually undetectableat 37°C (Boeke, 1989) and most likely reflects the inherent thermoliability of Tyl reversetranscriptase (Garfinkel et al., 1985).Exposure to a number of chemicals, such as halogenated pyrimidines, hydrogenperoxide and sodium azide, as well as exposure to UV light and ionizing radiation can alsoactivate proviruses and retrotransposons (Aboud et al., 1992; Favor and Morawetz, 1992;Kuff and Lueders, 1988; Strand and McDonald, 1985; Rolfe et al., 1986; Morawetz, 1987;Deragon et al., 1990; Servomaa and Rytomaa, 1990). It is interesting that provirusactivation by DNA damaging agents requires a subsequent round of DNA replication. If theDNA lesions are repaired prior to DNA replication, no virus induction occurs (Aboud et al.,1992). The mechanisms of this induction are not known. However, one hypothesis is thatendogenous retroviruses are suppressed by a repressor-like protein which can beinactivated by a cellular function involved in a “SOS response” to DNA lesions. The SOSresponse was first described for bacterial cells and involves a DNA damage induced69activation of a RecA protease that subsequently degrades the repressor of the genesinvolved in the SOS repair mechanism. Such a response has been shown to result in theactivation of lysogenic phage lambda.3.3 Cell cultureInitiation of cell culture is generally accompanied by higher rates of transposition.For example, in Drosophila, retrotransposons are overexpressed in cultured cells relative toexpression in the whole organism. This increase in expression is usually observed in theinitial culturing and may represent a stress response by the cells to the new environment(Echalier, 1989). Cell density can also affect retroelement activation. In rat chloroleukemiacells , maintained in suspension culture in vitro, Li elements are spontaneouslytranscriptionally activated at about half of the maximal population density. About 24 hrlater, an explosive amplification of the Li element is seen in DNA: —300,000 copies areinserted into apparently random locations in the cell genome, thus creating an outburst oflethal mutation (Servomaa and Rytomaa, 1988). MMTV expression can also be regulatedby cell density in GR mouse mammary cells but not in NIH 3T3 fibroblasts. Regulationthrough cell-cell contact is mediated by binding sites in the HRE of the LTR. This effect isunder negative control in NIH 3T3 cells (Hartig et al., 1993).E. THE IMPACT OF RETROELEMENTS ON THE HOSTAn ongoing debate regarding retroelements has always been whether theseelements perform any useful function for the host. One view is that such elementsrepresent selfish DNA (Doolittle and Sapienza, 1980; Orgel and Crick, 1980). Selfish DNAhas two basic properties: (1) the DNA tends to form additional copies of itself and spreadsthese within the genome, and (2) selfish DNA makes no specific contribution to thephenotype (Orgel and Crick, 1980). In fact, the presence of large amounts of selfish DNA70could be expected to be disadvantageous if not detrimental to the host. First, there is themetabolic energy cost of replicating a large amount of junk DNA’. Second, there are thedetrimental effects of insertion mutations and element-mediated rearrangements.1. AGENTS OF GENETIC DISEASETransposable elements have long been recognized as a major source of mutation inlower eukaryotes (reviewed in Berg and Howe, 1989). The role that retroelements may playin human disease is becoming more appreciated as reports of direct retroelementinvolvement in mutations resulting in disease accumulate. Retroelement-associatedmutations can either represent de novo insertions by the retroelement or element-mediatedrearrangements.1.1 De novo insertionsRetroelement transposition can result in insertional mutagenesis of cellular genesand a number of germline insertions resulting in disease have been reported. De novoinsertion of a truncated Li element into exon 14 of the factor VIII gene resulted inhemophilia A in two unrelated patients (Kazazian et al., 1988). The insertion of a 5’truncated Li element into the 3’ end of exon 44 of the dystrophin gene has been reported intwo brothers with Duchenne muscular dystrophy (Narita et al., 1993) This mutation, whichoriginated in their mother, resulted in the skipping of exon 44 during splicing of thedystrophin mRNA precursor (Narita et al., 1993). A germline Alu insertion into the codingregion of the factor IX gene has been reported in a case of hemophilia B (Vidaud et al.,i993). A de novo germline insertion of an Mu sequence into the NF-i gene has beendescribed in a patient with neurofibromatosis (Wallace et al., 1991). The Mu inserted intoan intron and caused aberrant splicing of the NF-1 transcript (Wallace et al., i991). An Aluinsertion into the cholinesterase gene has also been reported (Muratani et al., 1991).Finally, an Mu insertion near a candidate for the Huntington’s disease (HD) gene has beenreported in two HD patients (Goldberg et al., 1993).7iDe novo somatic insertions of retroelements have also been identified in severalcases of cancer. An Li insertion into intron 2 of the myc gene has been reported in a breastcancer case (Morse et al., i988). An Li insertion into the last exon of the adenomatouspolyposis coli (APC) gene has been reported in a case of colon cancer (Miki et al., i992). Inboth of these cases, the inserted element was found only in the tumor samples and not innormal tissue from the same patient. This suggests that the retroelement insertion mayhave played a role in the etiology of the cancers in these patients.There are no examples of human disease resulting from the de novo insertion of anHERV. The analysis of genomic DNA from two human breast cancer cell lines has revealedan additional band, not found in normal DNA samples, that hybridized to the HERV-K(UUU) element NMWV (May and Westley, 1989). The origin of this extra MMTV-relatedsequence is not known, but one possibility is that it may represent a recent integration(May and Westley, 1989). An RTVL-H element has been found to drive the expression of acalbindin-related gene in a cell line derived from a prostate metastasis (Liu and Abraham,1991). It has been suggested that the aberrant expression of that gene may havecontributed to the progression of the original tumor. However, whether this represents anew RTVL-H insertion has not been determined. In mice, several mutations have beenidentified that are caused by the germline insertion of an ERV element into or adjacent to agene (Jenkins et al., 1981; Stoye et al., 1988; Adachi et al., 1993). In fact, it has beenestimated that at least 5% of all spontaneously occurring recessive mutations in mice arecaused by C-type provirus insertions (Stoye et al., 1988). Also, numerous examples of ERVtransposition resulting in either loss of gene function or the up-regulation of geneexpression have been detected in mouse tumors and mouse cell lines (reviewed in Kuff andLueders, 1988; Favor and Morawetz, 1992).1.2 Element-mediated rearrangementsRetroelements represent sequence repeats dispersed within the genome.Recombination events between homologous sequences can result in the deletion or72duplication of the intervening sequences. Several cases of hypercholesteremia result fromrearrangements within the low density lipoprotein (LDL) receptor gene, the observeddeletions and duplications resulting from recombination between Alu sequences normallypresent in the LDL receptor gene locus (Hobbs et al., 1986; Horsthemke et al., 1987;Lehrman et al., 1987). A deletion in the adenosine deaminase (ADA) gene has also occurredvia homologous recombination between Alu elements (Markert et al., 1988). Also,homologous sequences on different chromosomes can be sites for recombination events, ashas been reported for a sex chromosome rearrangement in a human XX male (Rouyer et al.,1987) and for a rearrangement of the human tre oncogene identified in a cell line derivedfrom a Ewing’s sarcoma (Onno et al., 1992). Both result from recombination events betweenMu elements located on different chromosomes. Recombinations between homologousHERVs resulting in disease have not been documented. However, some HERVs mayrecombine with non-homologous sequences. For example, a THE-i element has beenidentified at the deletion breakpoint in the dystrophin gene (Pizzuti et al., 1992).Alternatively, HERVs may predispose a genomic region to delete. For example, one memberof the RTVL-H family is located close to the breakpoints of three naturally occurringdeletions in the f3-globin gene cluster (Mager et al., 1985). The retroviral sequence S71maps to 18q21, the same band as the major breakpoint region of t(14, 18) chromosomaltranslocations in B-cell lymphomas (Leib-Mosch et al., 1990). The same chromosomalregion is also involved in deletions frequently occurring in colon carcinoma (Vogeistein etal., 1988).1.3 HERVs and autoimmune diseaseA possible role for endogenous retroviral sequences has been suggested in theinitiation of autoimmune diseases. Two characteristic antigens recognized byautoantibodies in systemic lupus erythematosus and Sjogren’s syndrome have been foundto share homology and cross-reactivity with retroviral gag proteins (Garry, 1990; Krieg etal., 1992). Other cellular proteins have been shown to immunologically cross react with73retroviral proteins such as Ui small nuclear ribonucleoproteins with the p30 gag protein ofMLV (Query and Keene, 1987) and the heterogeneous nuclear ribonucleoparticle protein Alwith gp7O env of MLV (Trauger et al., 1990). It is possible that the activation of HERVsequences followed by the expression of retroviral proteins may lead to the production ofautoantibodies against immunologically related cellular proteins and the initiation ofautoimmunity.Replication defective proviruses have also been shown to encode products havingimmunosuppressive and antiimflammatory activity (Snyderman and Cianciolo, 1984). Forexample, the pi5E retroviral env encoded protein has been shown to inhibit monocytemediated cell killing by inactivation of IL-i (Kleinerman et al., 1987).2. FUNCTIONAL ASSIMILATIONIf selfish DNA is disadvantageous to the host, the question becomes why naturalselection has not eliminated all selfish DNA from the genome. It may be that the selectivedisadvantage associated with any particular stretch of nonfunctional DNA is very small sothat its elimination is a very slow process even on an evolutionary time scale. Themechanisms for deleting selfish DNA may also be inefficient. In the end what results maybe a dynamic balance. The uncontrolled spread of retroelements through the genome islimited by host mechanisms that regulate transposition as well as by the deleterious effectsof retroelement-induced mutations and DNA rearrangements. Retroelements may persistin the genome because the negative selection they impose is not strong enough to offset alow basal level of transposition. Under this view, selfish DNA represents a not-too-harmfulparasite within its host. The alternate view is that transposable elements serve a functionin the host genome. Selfish DNA by definition contributes little or nothing to the fitness ofthe organism. However, a study examining the fitness effects of Ty transposition in S.cerevisiae (Wilkes and Adams, 1992) demonstrated that in mixed cultures of strains withand without Ty elements grown under conditions that induce Ty transposition, Tycontaining clones came to predominate in the cultures, suggesting that one or more of their74Ty transpositions must have given these cells a selective advantage over their Ty-negativecounterparts. This suggests that the presence of transposable elements does have aselective advantage and a positive effect on the fitness of the host.Many theories have been put forth as to the possible functions of transposableelements. Cavalier-Smith (Cavalier-Smith, 1978) suggested that selfish DNA may acquire anonspecific function which gives the organism a selective advantage. He proposed thatexcess DNA may be the mechanism the cell uses to slow up development or to producelarger cells. Another proposal is that the distribution of retroelements along chromosomesmight provide a pattern for the recognition of homologous chromosomes in the initial stageof pairing at meiosis (cited in Favor and Morawetz, 1992 and Hutchison et al., 1989). Theobservation that Alu elements and other mammalian SINEs are CpG rich and that Aluelements are hypomethylated in the male germ line and tissues which depend on thedifferential expression of the paternal genome complement for development suggest apossible role in genomic imprinting (Hellman-Blumberg et al., 1993). Other postulatedfunctions for SINEs include origins of DNA replication, signals for RNA processing andstability, and control of transcription (see Deininger, 1989; von Sternberg et al., 1992). Theinsertion of Li elements at sites of pre-existing DNA nicks and breaks may serve to repairsuch breaks (Pardue, 1991; Hutchison et al., 1989). The sequences of the ReT family of D.melanogaster have been shown to insert into and “heal’ chromosome breaks, effectivelystabilizing the “wounded” chromosome (Biessmann et al., 1990). It has also been suggestedthat transposable elements might be involved in speciation events, evidence for this beingthe hybrid dysgenesis phenomenon of Drosophila.Another function that has been proposed by Jurka (1989) is that retroelements mayparticipate in an antiretroviral defense of the host. One speculation is that retroviralinfection may induce Alu expression and that high levels of transcripts, each resemblingsmall structural RNAs, may interfere with the production and transmembrane transport ofretroviral proteins (Jurka, 1989). Induction of Li expression may lead to lethal levels oftransposition as has been observed in rat chloroleukemia cells (Servomaa and Rytomaa,751988), eliminating the infected cells (Jurka, 1989). Also, post-embryonic expression ofHERVs could play a role in sensitizing the immune system to antigens shared by the HERVand a potential retroviral pathogen, thus providing the individual with protection againstlater infection.It is interesting that HERV sequences tend to be most highly expressed in placentaltissues. This has led to much speculation as to the role, if any, of HERVs in placenta(Johnson et al., 1990). Possible functions include: enu or gag products on trophoblastic cellmembranes mediating intracellular interactions or adhesion during placentation;endogenous env proteins inhibiting the attachment of related exogenous infectiousretroviruses; enu proteins related to the protein p15E8 functioning in pregnancy relatedimmunosuppression, protecting the fetal allograft; reverse transcriptase activityfunctioning in gene amplification in syncytiotrophoblast; retroviral proteins or LTRpromoter sequences influencing gene expression and cellular differentiation (Johnson et al.,1990).The LTRs of the proviral HERV families and retrotransposons contain thetranscriptional regulatory sequences that control the expression of the element. It is alsopossible that the presence of these regulatory sequences may affect the expression ofadjacent cellular genes, supplying enhancers, promoters and polyadenylation signals. Theseregulatory sequences, over evolutionary time, may come to be functionally assimilated intothe normal regulation of those cellular genes. In the mouse, numerous cases have beendocumented of somatic or tumor-specific retroviral insertions that have altered expressionof a proto-oncogene (reviewed in Favor and Morawetz, 1992). Examples are also known inwhich an LTR adjacent to a cellular gene has been incorporated into the normal regulationof that gene. The best characterized of these examples in human occurs within the salivaryamylase gene cluster. Insertion of a HERV-E element upstream of a duplicated parotidamylase gene resulted in a new tissue specificity, an amylase gene now expressed in thesalivary gland, due to enhancer sequences in the 5’ internal region of the upstream HERVE element (Samuelson et al 1990; Ting et al. 1992). Also, the promoter region of the human76cytochrome C1 gene has a short region of homology to an RTVL-I LTR suggesting it mayhave derived part of its transcriptional regulatory sequences from a HERV element. TheERV3 element promotes the expression of readthrough transcripts that splice into akruppel-related gene encoding a zinc-finger protein (Kato et al., 1990). The observation thatthis transcript is highly expressed in placenta but not in choriocarcinomas which derivefrom placental tissues suggests this transcript may encode a biologically relevant function.However, a direct link between the downregulation of this ERV3 fusion transcript andoncogenic transformation has not been proven (Kato et al. 1990). An RTVL-H LTRpromotes the expression of a transcript that initiated within the 5’ LTR but then splicesinto a gene, designated FLA2L, that contains two phospholipaseA2-related domains(Feuchter-Murthy et al., 1993). The 01 gene in the human a-globin locus utilizes atruncated Alu sequence for an essential part of the promoter region (Kim et al., 1989). Thisdegenerate Alu provides the CCAAT sequence for normal promoter function and this roleappears to be conserved in the great apes. Finally, two adjacent cis-acting regulatoryelements involved in the cell-specific regulation of transcription of the y chain of Fc and Tcell receptors have been identified as both being part of an Alu repeat (Brini et aL, 1993).These elements function as both positive and negative regulators and are recognized byDNA binding proteins in a cell-specific manner.Examples of ERV promoted/regulated genes have also been reported in otherorganisms, including the rat oncomodulin gene (Banville and Boie, 1989) and the mouseMIPP gene (Chang-Yeh et al., 1991), both of which are initiated within an TAP LTR; andthe mouse sex-limited gene (Stavenhagen and Robins, 1988; Adler et al., 1991), an ancientretroviral-like element upstream of this gene conferring androgen inducibility to this codingregion.The LTRs of HERVs and retrotransposons may also supply a polyadenylation signalto an adjacent cellular transcript. cDNAs being polyadenylated by THE-i LTRs, (Paulsonet al., 1987), by HERV-E LTRs (Tomita et al., 1990) and RTVL-H LTRs (Mager, 1989) havebeen identified but it has not been determined if they are associated with genes. An LTR of77the mouse retrotransposon family MuRRS has donated a polyadenylation signal to atranscript identified in the cell line A2OIJ (Baumruker et al., 1988). Alu and Li sequenceshave also altered the site of polyadenylation in a murine class I gene and the mousethymidylate synthase-encoding gene, respectively (Kress et al., 1984; Harendza andJohnson, 1990).3. MEDIATORS OF GENETIC VARIATIONThe presence of transposable elements in a genome represent the potential forgenerating genetic variability. De novo insertions can result in genes being brought underthe control of new regulatory sequences that may change tissue specificity or thedevelopmental timing of gene expression. Also, retroelements, in addition to directing theirown transposition, may allow for the nonspecific reverse transcription and integration ofother cellular transcripts, the origin of processed pseudogenes. Very rarely, such reversetranscribed sequences may retain or acquire (possibly through the integration of aretrotransposon) a promoter allowing the newly integrated copy, now referred to as aretrogene, to be expressed. Examples of this novel mechanism of gene duplication havebeen reported and include the autosomal human phosphoglycerate kinase genes (Boer etal., 1987; McCarrey and Thomas, 1987), the human prointerleukin-1 j3 gene (Clark et aL,1986), the mouse autosomal pyruvate dehydrogenase Ela gene (Dahl et al., 1990), themouse ZFa gene (Ashworth et al., 1990), the second rat preproinsulin I gene (Soares et al.,1985), the chicken muscle-specific calmodulin gene (Gruskin et al., 1987) and thewoodchuck N-myc 2 gene (Fourel et al., 1990). It has also been suggested that theintronless genomic structure of S. cerevisiae may be the result of a similar mechanismcoupled with the high propensity for the reverse transcribed processed copy to integrate viahomologous recombination, thereby replacing the intron-containing genomic copy (Fink,1988).Transposable elements by virtue of their repetitive and dispersed nature can alsocreate gene duplications through recombination events between homologous elements that78results in the duplication of the intervening sequences. Duplication of the y-globin gene hasbeen demonstrated to have been caused by homologous but unequal crossing over betweentwo Li sequences that flank the original gene (Fitch et al., 1991). Similarly, the humangrowth hormone gene and the haptoglobin-related locus have been duplicated byrecombination events mediated by Alu sequences (Barsh et al., 1983; Erickson et al., 1992).Gene duplications, whether generated via retroelement-mediated recombinationevents or via retrogene formation, represent a source of nonessential gene sequences uponwhich evolutionary processes can act to select for novel functions. Retroelements can alsocontribute to the generation of new functions by supplying novel regulatory sequences thatmay alter the levels and timing of gene expression. This potential for generating newgenetic functions represents a selective advantage to a species in an ever changingenvironment.In conclusion, retroelements, while it cannot be said that they have evolved to meetsome adaptive need, have acquired roles due to their widespread genomic distribution andnovel structural properties (Brosius, 1991). From this perspective, transposable elementscould well have originated in a selfish manner; however, once distributed chromosomally,individual elements could then diverge and, due to their genetic effects, adopt a function(Doolittle and Sapienza, 1980). Furthermore, such functions, once they had arisen, are thencapable of being under selective pressure so as to enhance the role or to diminish it(Brosius, 1991).The debate as to what role, if any, retroelements play in the host genome is usuallycarried out at the level of the individual organism. The detrimental effects of retroelementson the individual can not be argued with. However, retroelements function to increasegenetic diversity of the species as a whole. Variability can be considered a species-levelcharacteristic related to species survival and a reservoir of genetic variation within speciesacts as a hedge against extinction (Lloyd and Gould, 1993). Species that concentrateadaptations very narrowly are favored by natural selection at a given moment, but they79sacrifice the plasticity to face the inevitable change in their surroundings. It is interestingto note that retroelements are normally tightly regulated, thereby reducing mutationalload and allowing for selective optimization of phenotype for the current environment.However, retroelements appear to be activated by environmental and genomic stresses,supporting their potential for allowing rapid genetic change. This potential, concertedactivation within populations may also speed up speciation events (Fontdevila, 1992).F. THESIS OBJECTWESThe overall goal of this thesis was to examine the impact that endogenousretroviruses or retrotransposons have on the human genome. Transposable elements are•recognized as a major source of genetic change and mutagenic activity on the genomes oflower eukaryotes but their importance in higher eukaryotes is only beginning to beappreciated. We and others have hypothesized that mammalian retrotransposons play asignificant role in genome variability and gene regulation. My work has focused on theRTVL—H family of HERVs and what possible impacts these elements may have had andmay continue to have on the human genome. Three aspects of RTVL-H biology wereexamined:(1) The RTVL-H family was examined, in other primate species representing themajor branches of the primate lineage to address the following questions: How stable hasthe RTVL-H family been in the course of primate evolution? When in the primate lineagedid the expansion to 1000 copies take place? Finally, can a unique evolutionary history bedescribed for the RTVL-H family as has been described for other retroelement families? Mywork to answer these questions is the subject of Chapter III.(2) The structure of RTVL-H elements and their distribution in the genome suggestthat RTVL-H elements amplified via retrotransposition. In Chapter IV, I present work doneto address the following questions: Can additional evidence be found for RTVLH80transposition via an RNA intermediate? Also, are RTVL-H elements still capable ofretrotransposition? Work was done towards establishing an assay system for the detectionof RTVL-H retrotranspositions within an experimental time frame.(3) RTVL-H LTRs contain the regulatory signal required for the transcription of thatelement. In Chapter V, I address the question: Can the presence of an RTVL-H elementalso affect the regulation of genes in adjacent cellular DNA?81CHAPTER IIMATERIALS AND METHODSSiCell linesCell lines used in the retrotransposition (RT) experiments (Chapter IV) included thehuman cell lines Tera-1 (teratocarcjnoma) and 5637 (bladder carcinoma), available from theAmerican Type Culture Collection (ATCC); NTera2Dl (teratocarcinoma), kindly providedby Peter Andrews (Andrews et al., 1984), and the mouse packaging cell line GP+E86(Markowitz et al., 1988). Human and primate cell lines used as sources of DNAs includedthree cell lines from NIGMS Human Genetic Mutant Cell Repository and representingthree human inbred populations, Arnerindian (repository # GM10968), Melanesian (#GM10540) and Pygmy (# GM10494), and various primate cell lines all available throughthe ATCC: chimpanzee (WES skin cell line), gorilla (ROK cell line), gibbon (MLA 144lymphoma cell line), orangutan (CP18.5K normal skin cell line), baboon (26CB-1lymphoblastoid cell line), African green monkey (CV-1 kidney cell line), marmoset (HSVSilva 40 cell line). Cell lines were cultured in Dulbeccos modified Eagle medium (DMEM)with 10% fetal calf serum (FCS) (Tera-1, NTera2Dl, GP+E86, GM10968, GM10540,0M10494, WES, CP18.5K, CV-1) or were cultured in RPMI 1640 with 10% FCS (5637,ROK, MLA 144, 26CB-1, HSV-Silva 40). All cells were maintained at 37°C with 5% CO2.TransfectionsSubconfluent cell cultures were transfected with supercoiled plasmid DNA bycalcium phosphate precipitation (Graham and van der Eb, 1973). Plasmid DNA (for CATassays, 10 fig; for the generation of RT cell lines, 20 jig) was resuspended in 500 jil of 0.25M CaCl2 and added dropwise to 500 p1 of 2XHBS buffer (50mM Hepes, 280mM NaCl,1.5mM Na2HPO4,pH 7.12) while gently bubbling N2 through the HBS/DNA mixture. Theresulting solution was incubated at room temperature for 30 mm and then added directlyto the cell cultures (106 cells/100 mm tissue culture dish for Tera-1, NTera2Dl andGP+E86 (CAT assays), or 2x105 cells/60 mm dish for GP+E86 (RT experiments). Freshmedia was added to the cells 1 day post transfection. For CAT assays, cells were harvested2 days post transfection. For the generation of RT cell lines, drug selection was started 383days post transfection. For GP+E86 cells, cells were passaged up to 100mm dishes justprior to the start of drug selection.CAT assaysThe CAT expression vectors pSVAOCAT(X) and pSV2ACAT have been previouslydescribed (Kadesch and Berg, 1986; Kadesch et al., 1986; Henthorn et al., 1988) and werekindly provided by Dr. Paula Henthorn and Dr. Tom Kadesch. The cH-4 3’ LTR and H6LTR CAT vectors were kindly provided by Anita Feuchter (Feuchter, 1991; Feuchter andMager, 1990). The cH-4 5’ LTR CAT vector was constructed as follows: the 5’ LTR of thecH-4 element was isolated as a Stul/Earl fragment from a genomic clone containing the 5’end of the cH-4 element (Mager and Goodchiid, 1989), blunted and ligated into the Hindlilsite of pSVAOCAT(X).Cell extracts were assayed for CAT activity by the method of Gorman et al. (1982).Briefly, cells were harvested 2 days post transfection, resuspended in 100 jil 0.25M TrisHC1, pH 8.0, and lysed by repeated freeze/thawing. Cell debris was removed bycentrifugation at 10,000 rpm, for 5 mm. The resulting extract was then assayed for CATactivity at 37°C for —60 mm, with 20 pJ of a solution of 4mM acetyl coenzyme A and 0.2 jiCiof 14C chioramphenicol used as a substrate. Reactions were resolved by thin layerchromatography and autoradiography.Generation of retrotransposition (RT) cell linesCells were cotransfected with 19 jig RT vector and 1 jig pRC6, a piasmid containingthe hygromycin resistance gene and kindly provided by Dr. Rob Kay. For GP+E86 cells,duplicate 60 mm dishes at 2x105 cells/dish were transfected. For Tera-1 cells, duplicate 100mm dishes at iO6 cells/dish were transfected. Three days post-transfection, cultures wereplaced under hygromycin selection (GP+E86 cells were passaged into 100 mm dishes justprior to the start of selection). GP+E86 cells were grown in 0.2 mg/mi hygromycin, Tera-1cells in 0.1 mg/mi hygromycin, for 3 weeks. For Tera-1 cultures, isolated colonies were84picked and the resulting cell lines were analyzed by PCR and Southern blotting for thepresence of integrated, unrearranged RT sequences. For GP+E86 cultures, cells had to bereplated at a much lower cell density (duplicate 150 mm dishes at 100 cells/dish) in order toget isolated colonies. The resulting cell lines were analyzed as above. Cell lines identified ascontaining integrated, unrearranged RT sequences were subsequently identified as ‘RT celllines”. RT cell lines were analyzed by PCR and Northern blotting for expression of theintegrated RT vector.Selection systems for transposition eventsFor the selection of spontaneous HPRT- mutants (ie. lacking the enzymehypoxanthine guanine phosphoribosyl transferase), cells were cultured in 6-thioguanine (6-TO) for 3 weeks (NTera2Dl in 7.5-8.5 jig/mi 6-TG, 5637 in 0.3 jig/mi 6-TO, as determinedby dose response curves). The resulting 6-TO resistant clones were expanded and DNAisolated for Southern analysis to look for gross rearrangements in the HPRT gene. Toconfirm the loss of HPRT activity, several 6-TG resistant clones were split and one half ofthe culture was counterselected in medium containing 13 jig/mi hypoxanthine, 0.19 jig/miaminopterin and 3.9 jig/mi thymidine (HAT).RT cell lines shown to be expressing RT sequences were expanded and put under0418 selection (0418 is a neomycin analogue) (OP+E86 in 1 mg/mi, Tera-1 in 0.25 mg/mi).0418-resistant clones were analyzed by PCR for the presence of an integrated, spliced neomt cassette, which would indicate the retrotransposition of the marked RTVL-H element(see Fig. 4-7, p.144).Construction of the neo-int cassette and the retrotransposition (RT) vectorsThe “neo-int” retrotransposition indicator gene cassette was constructed by insertingthe large intron from the human G’ globin gene (Slightom et al., 1980) in the antisenseorientation within the neoT gene of the vector pMClneo-polyA (Thomas and Capecchi,1987). The final plasmid, pNeo-int, was constructed in several steps (Fig. 2-la): 1) A85BamHJIEagI fragment from pMClneo-polyA was subcloned, cleaved at the PstI site and aBbsI linker introduced. 2) An XhoTIAlwNI fragment containing all but the 39 bp of the 5’end of the human Gy globin second intervening sequence (IVS2) was isolated and insertedinto a plasmid containing a linker flanked by BbsI sites that served to reconstruct theremainder of the 5’ end of the intron. 3) The intron was then excised with BbsI andsubcloned into the BbsI sites previously introduced within the neor gene. To reconstitutethe rest of the neor gene, this construct was then inserted back into pMClneo-polyA whichhad been cloned as a blunted HindIILe’XhoI fragment into pBluescript KS (Stratagene).Restriction sites shown at the termini of the final construct in Fig. 2-la can be used toexcise it. The sequences at the intron junctions, confirmed by sequencing, are also shown inFig. 2-la as is the predicted sequence of the spliced copy. The single nucleotide changeintroduced as a consequence of the cloning procedure, C --> A, is shown but this does notalter the neor amino acid sequence (on the opposite strand). The intron introduces an inframe termination codon (*) that would result in a truncated neo’ protein.The JZEN/neo-int retrotransposition (RT) vector was constructed by introducing theneo-int cassette into a polylinker (inserted into the XhoI site) of the JZEN vector (amodification of pMPZen) (Johnson et al., 1989). The RTVL-Hlneo-int RT vector wasconstructed as follows (Fig. 2-ib): A genomic clone containing the 5’ end (—2 kb) of cH-4 anda cDNA clone containing the 3’ end (—2.7 kb) of cH-4 were both inserted into the polylinkerof Bluescript KS such that a short stretch of polylinker containing a unique SalT siteremained between the two halves. The neo-int cassette was inserted into this SalT site inboth orientations.A recombinant PCR strategy (Higuchi, 1990) was used to mutate the 5’ most base ofthe 3’ LTR of the RTVL-Hlneo-int RT vector from a C --> T (see Fig. 2-2). Four specificprimers were used (see Table 2-1 for primer sequences). The two “outside” primers wereMuti and Mut4, Muti being a 25mer encompassing a unique BbsI site in the internalsequence of the cH-4 RTVL-H element and Mut4 a 25mer encompassing a unique NotI site86A3/BxSBneo2SD-neorGTCICCAGCACATCCTGAAG6ICAGACCCAGCAGGACTICACICAGGTCCICTAC*SAFISpNeoSATCCTTTCATCTCAACAdCTCATTCAGGCCACCAGGAAACTAGACTTGTcIGAGTAAGT CCCCTGGTerArgLeuLeuGluAsnbeuAlaGlyCsphcedTGCCTCGTCCTGAAGCTCATTCAGGCCACCsequenceACGCACCAGGACTTCGAGTAACTCCCGTGGAlaGluAspGlnLeuGluAsnLeuAlaGlyneorPSBHI...pMC1neo-polyAPGPIIIpoyA Bxurnneor *1SDXpromoter,enhancerSAABbriiiiiiiiiiiiiiniiiiiiiiiiiiiiiniiiiiiiiiiiiiiimiiiniiiiiiiI11111IIIlI11111111111111111111111111111111111111111IIIISDBbSaSAA,1BblIIIIIIIIIIIIIIIIIIIII,IIIIIIlIIIIIIIIIIIIIIIItIliI,IIIIIIIIIII-I1OObp00 -aB5’ end of cH-4 3’ end of cH-4Xb S H PK S/N1kb s SD SA S= - Li J_neo-int cassetteFigure 2-1. (A): Construction of the neo-int retrotransposition (RT) indicator gene cassette.The sequences at the intron junctions and the predicted sequence of thespliced copy are shown. The single nucleotide change introduced as aconsequence of the cloning procedure i.e C --> A is indicated. Alsoindicated is the in frame termination codon (*) introduced by the insertionof the intron into pMClneo. Symbols: white boxes, neor coding sequences;stippled box, thymidine kinase (TK) promoter and polyoma virusenhancer; diagonally striped box, polyadenylation signal from the TKgene; black and white striped box, G’yIVS2; small arrows, primers used forPCR analysis; SD, splice donor; SA, splice acceptor. Restriction enzymes:A, A1wNI; B,BamHI; Bb, BbsI; Bx, BstXI; C, Clal; G, EagI; H, Hindu; K,KpnI; P, PstI; S, SalI; Sa, Sacli; X, XhoI.(B): Construction of the RTVL-H/neo-int RT vector. Solid lines indicateRTVL-H sequences; broken lines indicate plasmid vector sequences. Theopen boxes represent the LTRs. The neo-int cassette is indicated.Restriction enzymes: H, Hindill; K, KpnI; N, NotI; P, PstI; S, Sail; Xb,XbaI.88Muti-Bbsl“left” PCR withprimer mutagenesisMut3 Mut4“right” PCR with4 primer mutagenesisPCR with outsideprimersFigure 2-2. Recombinant PCR strategy used to mutate the 5’ most base of the 3’ LTR of theRTVL-Hlneo-int RT vector from C --> T. The heavy lines represent RTVLHlneo-int RT vector sequences; the open box represents the 3’ LTR. Thehorizontal arrows labelled Mutl-Mut4 represent the primers used in therecombinant PCR strategy (see Table 2-1 for primer sequences). The arrowlabelled 3’-int represents the primer used in sequencing to confirm the basechange. Thin lines represent PCR products. The X’ indicates the base changedto a ‘T’.Axremove primers andj denature/renature55’+33’ extensionMutiBbslBbslNotiNoti89in the polylinker just downstream of the 3 LTR. These two unique restriction sites wereused to substitute the recombinant PCR product in place of the corresponding BbsI[NotIfragment of the original RTVL-H/neo-int plasmid. Mut2 and Mut3 were the two “insideoverlapping primers, sense and antisense for the same 30 bp region encompassing the siteto be mutated and synthesized with the C --> T change incorporated into the primers. Twoprimary PCRs were performed under standard conditions (see section on PCR analysis), onereaction using primer pair Mutl/Mut3, the other Mut2IMut4. Amplification products weregel purified to remove primers. These overlapping, primary products were then used in asecond PCR with primers Muti and Mut4 to generate the full length secondary product.PCR parameters for this second PCR reaction were as follows: After a 2 mm 95°Cdenaturation step, 5 cycles of 94°C, 30 sec, 45°C, 30 sec, 720C, 1.5 mill were performed, thedecreased annealing temperature allowing for heteroduplex formation between the twooverlapping primary PCR products. Twenty additional cycles were performed using anannealing temperature of 55°C. The expected 1400 bp recombinant PCR product was gelpurified and subcloned into Bluescript (Stratagene), and sequencing was done to confirmthe C --> T change. The BbsLINotI fragment was then reisolated and subcloned into theRTVL-H/neo-int vector. During subcloning procedures, bacterial incubations were done at30°C to minimize LTR-LTR recombination. Large scale plasmid growths were also done at30° C. Plasmid DNA was isolated using the Qiagen plasmid kit (Qiagen).ProbesThe RTVL-H LTR subtype-specific probes are underlined in Fig. 3-5. The Type I -specific probe spans nucleotide (nt) positions 121-215 in Fig. 3-5 and was isolated from theType I 3’ LTR of RTVL-H4 (Mager, 1989) as a 85 bp HpaI/BstNI fragment. The Type Ia -specific probe spans nt positions 137-225 in Fig. 3-5 and was isolated as a 70 bp NcoJJAluIfragment from the Type Ta LTR H6 (Mager, 1989). The Type II - specific probe spans ntpositions 176-335 in Fig. 3-5 and has been previously described (Mager, 1989). The RTVLH pol region-specific probes, specific for three of the four pot regions (designated A, B, C and90D) that are deleted in most RTVL-H elements, are shown schematically in Fig. 1-3 (j. 27).The A region 1)robe is a 136 bp BspHIIEcoRV fragment that is entirely within region A andhas been previously described (Wilkinson et aL, 1993). The B region probe is aJ35 bpStyI/BbvII fragment isolated from a PCR clone generated from chimpanzee DNA usingprimers that spanned the B deletion (unpublished results). The C region probe is a 403 bpMluI/BglI fragment that is entirely within region C and has also been previously described(Wilkinson et al., 1993). The RTVL-H env region probe is a 600 bp EcoRI/SstI fragmentisolated from EL3.61a, an env-containing clone generated by PCR using a pol-region/3’ LTRprimer pair (unpublished data). Other RTVL-H probes used include: a general LTR probe, a380 bp Stul/Hindlil fragment isolated from RTVL-H2 (Mager and Freeman, 1987); an LTRU3 specific probe, a 323 bp StuI/SphI fragment isolated from the 1-16 LTR (Mager, 1989);LTR U5 specific probes, a 108 bp HindIII/SspI fragment isolated from RTVL-H2 (Magerand Freeman, 1987), and a 102 bp Hindlil/Dral fragment isolated from the 3’I element,RTVL-H1 .(Mager and Henthorn, 1984), and both previously described in Mager, 1989; a 5’internal probe, a 140 bp XmeI/NdeI fragment from RTVL-H1 (Mager and Henthorn, 1984);a 3’ internal probe, a 600 SstI/StuI fragment from RTVL-H1 (Mager and Henthorn, 1984).The HPRT-specific probe used was the 1.6 kb BamHI fragment of the full lengthHPRT cDNA p4aA8 (Jolly et al., 1983), kindly provided by Laura Reid (University of NorthCarolina, Chapel Hill). The neo-specific probe is an 853 bp Miul/Sall fragment frompMClneo (Thomas and Capecchi, 1987). Probes specific for cPj-LTR included a 760 bpEcoRl/Stul fragment that encompasses the entire non-LTR portion of the cPj-LTR eDNAand a smaller subprobe, a 120 bp Styl/Stul fragment (see Fig. 5..2). The cPp-LTR probe wasa 390 bp Styl/Stul fragment upstream of the LTR (see Fig. 5-2).Probes were radioactively labeled by the random primer method (Feinberg andVogelstein, 1983).91Library screeningFour primate genomic libraries were screened as part of the evolution study(Chapter III). Charon 32 bacteriophage libraries of partially digested EcoRI fragments ofgenomic DNA from human, chimpanzee, and orangutan were kindly provided by JerrySlightom (Upjohn). The Charon 4A library of partially digested EcoRI fragments fromformosa monkey DNA was obtained from the ATCC. The human term placenta cDNAlibrary in 2gt11, used in the polyadenylation study (Chapter V), was provided by PaulaHenthorn and Mitchell Weiss (Henthorn et al., 1986).Plates were lifted using standard protocols (Sambrook et al., 1989). Hybridizationswere performed at 65°C in 6xSSC (1xSSC is 0.15 M NaC1, 0.15 M Na citrate), lxDenhardts(0.02% Ficoll, 0.02% bovine serum albumin (BSA), 0.02% polyvinyl pyrrolidone), 1% sodiumdodecyl sulphate (SDS). Final washes were at 65°C in 0.5xSSC, 0.2%SDS for 30-40 mm.To estimate copy numbers, the fraction of hybridizing phage plaques wasdetermined and this number was used to calculate the number that would hybridize in2x105 phages (the approximate number of phages with an average insert of 14 kb neededto cover the whole genome of 2.9x10 bp).DNA isolation and Southern analysisDNA was isolated from cells using standard procedures (Sambrook et al., 1989) or,in some cases for PCR, using a rapid procedure that involved lysing —1000 cells in 0.06Msucrose, 2mM Tris (pH 7.5), 1mM MgC12,0.2% Triton-X and 100 ig/ml proteinaseK at50°C for 1 hr. A panel of human DNAs, isolated from primary human lymphocytes, waskindly provided by Dr. Au Turhan. For Southern analysis, DNAs (except for PCR products)were restriction digested, electrophoresed in 0.8-1.5 % agarose gels, and transferred tozetaprobe membranes (BioRad) by the alkaline blotting procedure (Reed and Mann, 1985).Transfers of cloned DNAs and PCR amplified DNAS were hybridized at 65°C in 6xSSC,lxDenhardts, and 0.5-1% SDS in the presence of— 3x105 cpmlml denatured probe.Transfers of genomic DNAs were hybridized at 65°C (except for certain blots containing92non-primate DNAs which were hybridized at 55°C) in 3xSSC, 0.2% Ficoll, 0.2% polyvinylpyrrolidone, 20mM Na phosphate, pH 6.8, 2%SDS, 0.2% BSA in the presence of 50 jig/midenatured sheared salmon sperm DNA and 2-6x10 cpni/ml of denatured radioactive probe.Probe and salmon sperm carrier DNA were denatured by incubation in 1/10 volume of 2MNaOH at room temperature for 15 mm, followed by neutralization with 1/10 volume of 2NRd. Post hybridization washes for blots of cloned or PCR amplified DNAs were at 65°C in1-3xSSC, 0.5-1%SDS for 45-60 mm. For blots of genomic DNAs, washes were at 65°C in 1-3xSSC, 1% SDS for 2 x 45-60 mm, or with the second wash at higher stringency, 0.1-0.2xSSC, 1%SDS.RNA isolation, cDNA synthesis and Northern analysisTotal and polyA+ RNAS from a variety of human cell lines and primary placentaltissues were kindly provided by Dave Wilkinson (Wilkinson et al., 1990). Total RNA wasisolated from hygromycin-resistant clones by a quick RNA extraction procedure (Brenner etal., 1989) that involved lysis of —i cells (for PCR) or iO6 cells (for Northern analysis) in4M guanidine isothiocyanate, 0.5% sarkosyl, 25mM Na citrate, 1% 2-mercaptoethanol,followed by isopropanol precipitations.For eDNA synthesis, RNAS were reverse transcribed using random hexamers asprimers and Moloney murine leukemia virus reverse transcriptase. Conditions for the 30 p1reactions were: 100 ng random hexamers, 500 jiM dNTPs, 30 U RNase inhibitor, 200 Ureverse transcriptase in 50 mM Tris-HC1 pH 8.0, 60 mM KC1, 3 mM MgCl2,10 mMdithiothreitol, 100 jig/ml BSA. The reactions were carried out for 1 hr at 40°C, followed byan inactivation step of 5 mm at 95°C (Brenner et al., 1989). Five to ten p1 of each eDNAreaction were used per PCR.For Northern analysis, total cellular and polyA+ RNAs were electrophoresed in 1-1.2% agarose gels containing 0.66M formaldehyde, using 1xMOPS buffer [20mM 3-(N-morpholino)propanesulfonic acid, 5mM sodium acetate, 1mM EDTA pH 7.0] at 1-3 V/cmand then transferred to zetaprobe membranes (BioRad) in 1OxSSC. Hybridizations were93carried out at 65°C in 1.5xSSPE (3M NaC1, 0.2M NaH2PO4,0.02M EDTA), 5xDenhardts,1% SDS in the presence of 500 jig/mi denatured sheared salmon sperm DNA and 2-6x10cpmlml denatured probe. Final post-hybridization washes were at 55°C in 0.1xSSC, 1%SDS.DNA sequencing and computer analysisFragments were subcloned into plasmid vectors and sequenced using the dideoxychain termination method modified for use with double stranded templates and T7 DNApolymerase (Tabor and Richardson, 1987). Sequence comparisons and alignments weredone with the aid of the software package of the Genetics Computer Group (Devereux et al.,1984).PCR analysisStandard PCRs were carried out under the following conditions: 20mM Tris pH 7.5,50mM KC1, 2.5mM MgC12,0.0 1% BSA, 250jiM each dNTP, 1.25U Taq polymerase, —30-40pmoles each primer and 0.5 jig genomic DNA or 1-2 ng plasmid DNA as template. PCRcycle parameters were: 2 mm at 95°C, lx; 30 sec - 1 mm at 94°C, 30 sec - 1 mm at 45-55°C,30 sec - 1.5 mm at 72°C, 25-30x; 3 mm at 72°C, lx. (Innis et al., 1990). PCR products wereanalyzed by electrophoresis in 1-2% agarose gels and when necessary by Southern analysis.The sequences of all primers used in the various PCR strategies are listed in Table 2-1. Tocheck for possible contaminating DNAs, each set of reactions also included a “no DNAadded” control. These controls were uniformly negative.For the evolution study (Chapter III), a PCR strategy was designed to determineintegration times for individual RTVL-H elements during primate evolution. PCRs wereperformed using primers that spanned individual RTVL-H elements and/or the cellularLTR junctions (see Fig. 3-7 for PCR strategy, p. 114; see Table 2-1 for primer sequences).Flanking primers (primer locations 1 and 4, Fig. 3-7) were generated for a number ofpreviously isolated elements (listed in Table 3-2, p. 116) representing the three LTR94Table 2-1. Sequences of oligonucleotide primers used in the various PCR strategies. Linkersequences containing restriction enzyme sites are shown in lower case letters.2-1/4-5 Neol tcatgtcgacATTCGGCTATGACTGGGCAC2-1J4-5 Neo2 agattctagaGGTGAGATGACAGGAGATCC2-2 Muti TCCCVI’ACGGTCCTCCGTCTTCAAG2-2 Mut2 GCTTGGGCTCAGAGGCCTGACATTCCTGCC2-2 Mut3 GGCAGGAATGTCAGGCCTCTGAGCCCAAGC2-2 Mut4 CGGTGGCGGCCGCTCTAGAACTAGT2-2/3-7 3-int/primer6 CCGACACTI’CAACACTA3-i ABC-i gtcgcggatccGACTGACCCTGACACCCA’fl’3-i ABC-2 CTCT/CGCTI’CTCACCIIWfl’C3-i ABC-3 AAGACAGCAGAGACTGC3-i ABC-4 GCAGTCTCTAAAGCTGTCTI’3-i ABC-S gcacgggatcccTAGTCCTTl’TGCAAGAGCIFGAGGG3-i ABC-6 gtcgcggatcccGAAAGGAAATGAGAGtYl1C3-7 RTVL-Hi primer 1 GCCACAAGGTAAGITPGTAC3-7 RTVL-Hi primer4 C’IT[9’GCCATATCCTGTAGG3-7 RTVL-H3 primeri GCAAAGTACACTATGAC3-7 RTVL-H3 primer4 AGC’VPGCAAAGAGGTAAC3-7 So12 primeri CCAGACTCTGGTGTGT3-7 So12 primer4 TGGCAATGGTGGCCATG3-7 PB-3 primeri GTATCAGGGTAACCAACT3-7 PB-3 primer4 GCTAGTCACCTAACTATG3-7/5-2 primeri ‘11CTI’CTTGGGTAGTCCACT3-7 Po primeri CAATGACCTTGAGCACAGCT3-7 Po primer4 TACTGGTCACTGTCTTCTGG3-7 HpFori primeri AGGTGACTACCTCCTGGTAG3-7 HpFori primer4 ACGTC’ITPCAGCCAGACCAC3-7 cH-4 primeri GAGVI’ACAGTGAAAGTCGCT3-7 cH-4 primer4 ATCAGTCTCCTCTGCTIVIAC3-7 iaAlui primeri AAAACTGAAAAGAGAC3-7 laAlu2 primer4 CAGTCCAAAAAACCAGTAAC3-7 iaAlu2 primer4a CCACAATCATGATGAGGTAC3-7 iaAlu3 primer4 CATCTGATC’ITrCACAGCAC. 3-7 iaAlu3 primer4a CAGTATGTGCCAGGTAGCTA3-7/5-2 primer2/primer3 TGGCTTG/AGCTI’GGGCTCAGAGGCCT3-7/4-3 primer3/U5 primer G’n9GGTGGTCTCrI’CACACG3-7 primer2a aagtcgcggccgcTTCACAQGG11’AATCAC3-7 primer3a aagtcgcggccgcGTGATI’AACCCTGTGAA3-7 primer5 TCACGGAGCAAAGAACAGGA4-i SD-5’ caagtaggatCCCAAGGAACATCTCACCAA4-i SA-3 caagtaggatCCfl’CCACTGTGAGAG’fl’AC4-3 U3-1 TGTCAGGCCTCTGAGCCCAAGC4-3 U3-2 CAGATGGCCTGAAGTAACTGAAG4-3 HuM-Si Jxn primeri CACACGACCTCTACACGCCT4-3 HUM-Si Jxn primer2 GACCTCTACACGCCTAAACT4-3 GIB-Si Jxn primer TGAAGGTCACCCCAGGATTA4-3 GIB-S2 Jxn primer A’fl’CGGCTC’11CTAGCCTAA4-3 GIB-S3 Jxn primer TCCGTAAGGAACCGGGCTAA4-3 ORANG-Si Jxn primer AAAGGTCACCCGAGGGTIAAN/A Alui tcatgtcgacGCGAGACTCCATCTCAAA5-2 primeri TACATTCTI’GGTGGCACGTC5-2 primer2 GGAGAACATCTGCCATCAAG5-2 primer4 AAGTACCTTCTCAAGGGTGG95subtypes. Flanking sequence information for additional Type Ia elements was obtainedthrough PCR using a Type Ia-specific primer (nt 178-197, both sense and antisensestrands, coordinates refer to Fig. 3-5a) and an Alu-specific primer (Brooks-Wilson et al.,1990), taking advantage of the likely presence of an Alu element within amplifiabledistance of some Type Ta LTRs. Flanking sequence information for three additional Type Taelements was obtained by subeloning and sequencing PCR- generated fragments. Whenflanking primers were available for both sides of an element (RTVL-H1, RTVL-H3, So12,PB-3, Po, HpForl, cH-4, see Table 3-2), PCRs were performed using the paired flankingprimers and each flanking primer paired with the appropriate LTR primer (see Fig. 3-7c).Whether there is no element, a full length element, or a solitary LTR at a given locus, anamplified product would be obtained with the appropriate primer pair. The LTR primerswere designed using sequence comparisons of several LTRs: primer 2, nt 5- 29, antisensestrand; primer 3, nt 417-437, sense strand (coordinates refer to Figure 3-5a). For fourelements (cPj-LTR, laAlul, laAlu2, laAlu3, see Table 3-2), flanking information wasavailable for only one side of the element and thus only PCRs across the element/cellularjunction could be performed (see Fig. 3-7d). An amplified product would only be obtained ifan element (full length or solitary LTR) was present at that locus. The “no element”situation could not be directly determined. In these cases, several combinations of LTR andflanking primers were used (see Fig. 3-7d) to strengthen the conclusion that “no amplifiedproduct” represents “no element”. LTR primers 2 and 3 in Fig. 3-7d are the same as in Fig3-7c; primers 2a and 3a represent the Type la-specific primers described above for theisolation of the three laAlu “elements”. Flanking primers were also paired with internalsequence primers (primer positions 5 and 6, see Fig 3-7d) to determine whether theelements were full length or solitary LTRs. Internal sequence primers were designed usinga prototypical RTVL-H sequence (RTVL-H2, Mager and Freeman, 1987): primer 5, nt 503-522, antisense strand; primer 6, nt 5310-5326, sense strand (coordinates from Mager andFreeman, 1987).96CHAPTER IIIEXAMINATION OF THE EVOLUTIONARY HISTORY OFTHE RTVL-H FAMILY OF HUMAN EN])OGENOUSRETROVIRAL ELEMENTSSome of the data presented in this chapter have been incorporated into the followingmanuscripts:Wilkinson D.A., Goodchild, N.L., Saxton, T.M., Wood, S. and Mager, D.L. 1993. Evidencefor a functional subclass of the RTVL-H family of human endogenous retrovirus-likesequences. J. Virol. 67: 2981-2989.Goodchild, N.L., Wilkinson, D.A. and Mager, D.L. 1993. Recent evolutionary expansion of asubfamily of RTVL-H human endogenous retrovirus-like elements. Virology 196: 778-788.INTRODUCTIONThe work described in this chapter examines the evolutionary history of the RTVLH family of human endogenous retroviral elements within the primate lineage. For severalhuman retroelement families, evolutionary histories, independent of that of the hostspecies, have been documented. For both the Alu (Britten et al., 1988; Quentin, 1988;Matera et al., 1990; Batzer and Deininger, 1991; Jurka and Milosavljevic, 1991; Shen etal., 1991) and Li (Skowronski and Singer, 1986; Scott et al., 1987; Jurka, 1989) families,subfamilies have been identified that have apparently arisen and expanded at differenttimes during primate evolution. The origin of Li sequences is believed to be quite ancientsince Li families have been identified in many mammalian species and since relatedsequences are found in lower eukaryotes and in plants (reviewed in Hutchison et al., 1989).In contrast, the Alu family is believed to have arisen independently in primates from a 7SLRNA gene (Ullu and Tschudi, 1984). THE-i elements are believed to be a part of a largersuperfamily of sequences named ‘Mammalian apparent LTR-retrotransposons’ or MaLRs(Smit, 1993) that also includes the primate MstII repeats and the MER15 and MERi8families, and in rodents, ORR-1 and MT elements (Sniit, 1993). Evidence suggests that thefirst MaLRs were distributed before the radiation of eutherian mammals 80-100 millionyears ago. The THE-i family represents a younger group in the MaLR superfamily presentin simians only. The question arises as to whether a distinct evolutionary history can beidentified and described for the RTVL-H family.We have previously reported the existence of different RTVL-H subfamilies,designated Type I, Type II and Type Ta, based on LTR sequence differences (Mager, 1989).The LTR sequences are very similar over the first 80 bp of the U3 region and over the last70 bp (R and U5) of the LTR. The middle region, all within U3, is distinct betweensubtypes. The subtypes appear to have different functional capabilities. For instance,characterization of heterologous transcripts in placenta which are polyadenylated within anRTVL-H LTR indicates that most such LTRs are Type II (Mager, 1989; see Chapter V).Furthermore, Type Ta LTRs have the strongest promoter activity in transient reporter gene98assays (Feuchter and Mager, 1990; unpublished results). RTVL-H elements can also bedistinguished by whether they are intact or deleted. The RTVL-H elements first identifiedhad deletions in the pot region and lacked an enu domain. Recently, work by our laboratory(Wilkinson et al., 1993; Wilkinson and Mager, 1993) and others (Hirose et al., 1993) haveidentified RTVL-H elements having an intact poi and possessing an env domain. We referto such elements as RTVL-Hp. Here I have examined the relative distributions of the threeLTR subfamilies and of intact and deleted elements in different primate species todetermine their evolutionary relationships and to assess their relative stabilities within theprimate lineage. I have identified two major expansions of the RTVL-H family that haveoccurred during primate evolution. The first involved the expansion of deleted elements in acommon ancestor of Old World monkeys and apes. The second expansion involved a morerecent amplification, in a common ancestor of the apes, of deleted elements associated witha new LTR subtype. I would like to acknowledge the contributions to this work made byDave Wilkinson, who provided the sequence analysis of the repeat structures of the LTRsubtypes shown in Figure 3-5, by Dixie Mager who did much of the genomic libraryscreening and Southern analysis using the marmoset probes, and by Doug Freeman whosequenced the marmoset clone.RESULTSAmplification ofdeleted elements in the Old World primate lineage: The firstexpansion was identified by analyzing the structure of the RTVL-H poi genes. The initialRTVL-H elements sequenced in this region contained four deletions (referred to as A, B, Cand D; see Fig. 1-3, p. 27) relative to the Mo-MLVpo1 region (Wilkinson et al., 1993).However, we have recently identified some elements, termed RTVL-Hp, with more intactpol domains. DNA fragments were isolated that were specific for the A, B and C regions(Fig. 3-1) and were used to probe a panel of primate genomic DNAs. Hybridization of aSouthern blot with the A region-specific probe is shown in Figure 3-2 and reveals thatmultiple copies of RTVL-Hp elements are dispersed within the human genome and are99ABC-i-2-3.IIC_II,,,,,,,,I_f,,,,,,100bpFigure3-1.Aschematicdiagramof aportionof atheoreticalRTVL-Hpelement showingpoiregionsA,BandC(boxed).Region-specificprobesareindicatedbythehatchedboxesbelowtheelement.PCRprimerpositionsareindicatedbythesmallarrowslabelledABC-ithroughABC-6(primersequencesaregiveninTable2-i).A B1234 5 7 8Figure 3-2. Southern analysis of a panel of EcoRI-cligested primate DNAs using the probespecific for pot region A. Lane 1, human; 2, chimpanzee; 3, gorilla; 4, gibbon; 5,orangutan; 6, baboon; 7, African green monkey; 8, marmoset. (A): The filterwas washed under moderately stringent conditions (30 mm., 650C, 2xSSC, 1%SDS) and exposed for four days. (B): The same filter was rewashed underhigh stringency conditions (30 mm., 65°C, 0.2xSSC, 0.1% SDS) and exposedfor 13 days.21.0-9.4-5.8-4.4-•J‘Ii’ I11234567 82.3-1.6-I..0101present in similar copy numbers within the genomes of apes and Old World monkeys.Hybridization of Southern blots with probes specific regions B and C produced very similarbanding patterns as that seen with the A region probe (data not shown). Interestingly, theA region probe, but not B and C, detects a few distinct bands in the DNA of a New Worldmonkey, marmoset, under moderately stringent conditions (Fig. 3-2A). The reason for thismay be that sequences in the reverse transcriptase domain, in which region A is located,are the most highly conserved part ofpol genes (Doolittle et al., 1989). A high stringencywash eliminated hybridization to marmoset DNA but not to Old World monkey DNA (Fig.3-2B). At lower stringencies, a few distinct bands can be detected in marmoset DNA with Band C probes as well as with a standard RTVL-H internal probe (data not shown). Theseresults suggest that marmoset does contain a few RTVL-Hp related sequences. Based onthe banding patterns observed on the Southern blots and the frequency of clones obtainedfrom screening genomic libraries with B and C region probes (data not shown), it isestimated that 50-100 RTVL-Hp elements containing at least one of the pol regions A, B orC are present within the human genome.It has been difficult to determine the number of RTVL-Hp elements that contain allthree regions intact. The resolution of Southern blot analysis is not sufficient to identifycommon hybridizing bands. Also, there is the problem of choosing restriction enzymes thatdo not cut within the pol region. This was also a problem with the library screeningprocedure used in that all of the genomic libraries screened were partial EcoRI libraries andthere is a highly conserved EcoRI site between the B and C regions. Therefore, I devised aPCR approach that involved using various primer pair combinations that span the A-B-Cregion (see Fig. 3-1). PCR products were analyzed by gel electrophoresis and by Southernanalysis using the A, B and C region probes. These PCR data indicated that deletionsinvolving A, B and C occur in various combinations among RTVL-H elements. The differentpossible structures have not been fully defined, since complicated patterns are detected byPCR amplification of most primate DNAs (see Fig. 3-3). Some bands could not be assigned apol domain structure (ie. in terms of having A andlor B andlor C) as they had unexpected102123456789PCR among primate species using primers within pot region A (ABC-2) anddownstream of region C (ABC-6) (see Fig. 3-1). Lane 1, no DNA negativecontrol; 2, human; 3, chimpanzee; 4, gorilla; 5, gibbon; 6, orangutan; 7,baboon; 8, African green monkey; 9, marmoset. The top arrow indicates the1.6 kb fragment amplified from elements having the A, B and C pot regions.The lower arrow indicates the 1.1 kb fragment amplified from elementshaving only A and B regions.1100.Figure 3-3.103sizes and hybridization patterns. These bands may derive from elements having otherdeletions. An example of the results obtained from a PCR using one ABC primer pair,primers ABC-2 and ABC-6 (see Fig. 3-1) is shown in Figure 3-3. Several observations canbe made. (1) All species examined have a band of 1.6 kb corresponding to amplification fromelements having an intact poi domain (Fig. 3-3). Comparison of band intensities acrossspecies suggests that these elements are fewer in number in the higher primates. (2) Thereappears to be certain deleted elements present in apes that are not found in the Old Worldmonkey species examined. For example, the apes appear to have elements containingregions A and B but not C (based upon the size of the amplified band, 1.1 kb, and itshybridization to A- and B- but not C-region specific probes), while the presence of suchdeleted elements was not detected in the Old World monkeys (Fig. 3-3). This may be relatedto the order in which regions were deleted and the subsequent expansion of the variousdeleted elements in different species. However, the complicated PCR amplification patternsobtained made determining the order in which the deletions may have occurred in primateevolution difficult. It may be that deletion of a particular region occurred more than once.However, examination of a number of C region-deleted elements showed the same deletionbreakpoints, suggesting that the deletion occurred only once with subsequent expansion ofthat element. A region and B region deletion breakpoints also appear to be identical indifferent elements. (3) Other bands seen in Fig. 3-3 were of unexpected sizes andhybridization patterns and could not be assigned a poi domain structure simply in terms ofhaving A andlor B andlor C. For example, the —800 bp band detected in all Old Worldspecies hybridized to both B and C region probes and should also have the A region as an Aregion-specific primer was used in the PCR. However, this band is only 800 bp in lengthand therefore must have partially deleted A, B and C regions andlor deletions within theintervening sequences. To definitively assign p0t domain structures, amplified bands wouldhave to be isolated and sequenced. (4) Finally and perhaps most interestingly, PCRamplification from marmoset DNA using primers ABC-2 and ABC-6 yielded only the 1.6 kbband corresponding to the presence of elements having an intact pol domain (see Fig. 3-3).104Subsequent cloning of this band resulted in the isolation of several highly related clones,one of which has been sequenced in its entirety in our laboratory. This clone shares —75%identity with the intact RTVL-H sequence reported by Hirose et al. (1993), indicating thatthis 1.6 kb fragment does represent amplification from an RTVL-H related element presentin the marmoset genome. Like all human RTVL-H elements so far examined, thismarmoset sequence contains stop mutations that would prevent translation of the pol ORF.Interestingly, the additional bands derived from the various deleted elements observed inthe Old World species are not observed in marmoset. This result, taken together with theresults from the Southern analysis (see Fig. 3-2) suggest that New World monkeys containa small number of sequences related to RTVL-H elements and that these sequences areintact in the poi region. These findings imply that the expansion of deleted elementsoccurred within a common ancestor of catarrhines, subsequent to divergence from NewWorld monkeys. In addition, the lack of hybridization under high stringency conditions, andthe failure of probes from other regions ofpol to hybridize to marmOset DNA except underlow stringency conditions indicate that the sequences in New World monkeys aresignificantly diverged from the RTVL-H family in humans. Because of this, our laboratoryisolated probes from the 1.6 kb marmoset PCR fragment specific for region C and for thearea between regions B and C and hybridized them back to marmoset genomic DNA. Whilehybridization with the marmoset region C probe yielded only a few bands, hybridizationwith the second probe yielded 5O bands (data not shown). Thus, it appears that marmosetdoes contain pol deleted elements. However, these sequences have not amplified to thesame extent as in the Old World lineage where deleted elements are present in nearly athousand copies. The single product obtained in the PCR shown in Fig. 3-3 may be due tothe fact that primer ABC-2 is within the A region; all elements in marmoset containing theA region may be intact throughout the poi domain.In addition to deletions in the pol region, the RTVL-H elements originally isolatedand characterized did not have a enu domain. This observation suggested that the RTVL-Hfamily might be similar to the retrotransposons of lower eukaryotes such as copia of105Drosophila and Ty of yeast. However, the recent identification of RTVL-H elementscontaining an env domain (‘Hirose et al., 1993; Wilkinson and Mager, 1993) has hown thatRTVL-H is a true endogenous retrovirus family. I have performed Southern analysis toshow that env-containing elements are present in all primate species examined (Fig. 3-4).In the Old World species, these appear to be present in 50-100 copies. Even in marmosetthere were multiple bands which were generally of stronger intensity than when the A, Bor C pot probes were used (Fig. 3-4a). This is of interest since the env domain is typicallyless well conserved among retroviruses. A higher stringency wash did eliminatehybridization to marmoset DNA (Fig. 3-4b).To summarize this part of the study, there appears to have been a significantexpansion of RTVL-H elements in the ancestor to the Old World primate lineage after thedivergence from New World monkeys. The New World monkey, marmoset, appears tocontain a few intact RTVL-H related elements as well as —50 deleted sequences. Thiscontrasts with all Old World species examined which have 50-100 copies of elementscontaining one or more of the three poi deletions and the env domain, and 800-900 copies ofdeleted elements. The similar copy numbers in all Old World species examined suggeststhat the expansion of RTVL-H elements occurred early within the Old World lineage beforethe divergence of Old World monkeys and apes (30 MYr), and have been relatively stable inthe primate genome since that time.Recent expansion of a subfamily of deleted elements: I have also identified a secondexpansion of deleted RTVL-H elements that occurred more recently, within the ape lineageafter its divergence from Old World monkeys 30 MYrs ago. It involves the amplification of asubfamily of elements defined by sequence differences in the LTRs.Structures ofRTVL-H LTR subtypes: Our laboratory has previously identified RTVL-HLTR subtypes based on DNA sequence differences that occur primarily in the U3 region(Mager, 1989; Wilkinson, 1993). Sequence comparisons of the three LTR subtypes that arecurrently recognized- Type I, Type Ia and Type II - are shown in Figure 3-5a. The differentLTRsubtypes share common sequences in the 5’ terminal 80 bp of the U3 region and in the‘106Figure 3-4.A1234 5 678B1234567 8Southern analysis of a panel of EcoRI-digested primate DNAs using an envspecific probe. Lane 1, human; 2, chimpanzee; 3, gorilla; 4, gibbon; 5,orangutan; 6, baboon; 7, African green monkey; 8, marmoset. (A): The filterwas washed under moderately stringent conditions (40 mm., 650C, 2xSSC, 1%SDS) and exposed for 3 days. (B): The same filter was rewashed under highstringency conditions (40 mill., 65°C, 0.2xSSC, 1% SDS) and exposed for 4days.21.0-9.4-5.8-4.4-2.3-1 .6-107Figure 3-5. RTVL-H LTR subtypes.(A): Sequence comparison of three RTVL-H subtypes. The Type I sequencepresented is a consensus (Mager, 1989), with R indicating a purine and Yindicating a pyrimidine, The Type Ia sequence is from eDNA isolate cH-4(Mager, 1989) and the Type II sequence is from a genomic clone containing thecPB-3 LTR (Mager, 1989). The LTR subtypes can be characterized on the basisof two different sequence repeats. Shaded boxes represent the 49 bp “Type I”repeats and the unshaded boxes represent the 27-32 bp “Type II” repeats. Theregion in each LTR sequence that is underlined in black was isolated and usedas subtype-specific probes.(B): A schematic representation of the sequence structure of the RTVL-H LTRsubtypes. Open boxes represent sequence common to all three subtypes.Shaded boxes represent Type I repeat sequences, and striped boxes representType II repeat sequences. The black boxes represent sequence in commonbetween Types I and Ia. As indicated, a recombination event between Type Iand Type II LTRs, occurring after the second Type I repeat in a Type I LTRand after the first Type II repeat in a Type II LTR, would generate arecombinant with a Type Ia structure.108ITypela[TypeITypeIaTypeIITypeITypeITypeIaTypeIITypeIiiCJIjAIGCtJC1t1AJAAJ111TIITGTCAGGCCTCTGAGCCCAAGCTAAGCCATCAYATCCCCTGTGACCTGCACRTAYACRTCCAGATrrrAl’ALa..“.‘-‘.-IIIIII‘‘‘S.’,III]kAIIILull120AAAATI(3411GTACTGAAGAII’C4WIAMGAA.III1111111111111111jJ__IIIAGTAICTGAAGAIAAAMAai.AATA B TypeI[Typer_________________________________240rTT1(TCA(AAGCTCCCCCACT..GC,i,,,j,,,iiIIIIiiiIiiIiIiiiiIIIIiIiiIIIiiIII1111111111AIJ(TTCTCCC.(T.AC(TTAA(GIATCAAT(TACTTTT.AATCTCCCCCACCCTTAA(AACTA(.TTT1TAATCTTCr(AC(T.TTAI1I...I.I.I.IIIH.Ii..IIIIiIiIWIIIIIIi’iiGCi1WACGAAGAT19MCTTGTGAAATTCCTTCTCCTGGCTCATCCTGGCTCAAAAGCTCCCCYACTGAGC241..........360TypeICJ?TAAJJTITTTAIAttJdllf 11t.ftiti1TypelàACCTTGTGACCCCCGCCCCTGCCCACCAGAGAACAA.CCCCCTTTGACTGTAATTTTCCATTACCTTCCCAAATCCTATAA.AACGGCCCCACCCIIIII111111III.HIIIIIII111111111111IIITypeIIAAAJTUTTT(iTAATTrT1fl(A1.flCTTr,AC-14AT(TACTTT(,(A1ATCCACCCTGkCCACAAAACATTGCTCTTAACTTCACC0CCTAACCCAAAACCTATAAGAACTAATGATAATIII111111111IIIIIIIIliiiIIIITypeIACCTTITGACCCCCACTCCTGCCCGCCAGAGAACAACCCCCYTTTGACTGTAATTTTCCTTTACCTACCCAAATCYTATAA-AACGGCCCCACCC361...........479TypeIIIIIII11111111111II1111111Iliii1111111111111I111111IIliiiIIIIII111111IliiiIIII1111111TypeIa111111IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII1111IITypeIIIIII11111111111111IIIII1111111111111111111IliiiIII11111111111111IIIIIIIliiiIITypeICTATCTCCCTTYGCTGACTCTCITTTCGGACICAGCCCGCCTGCACCCAGGTGAAATAAACAGCYTTRTTGCTCACACAAAGCCTGTTTGGTGGTCTCTTCACACRGACGCGCATGAAA > >I>50bpP. and U5 regions, but; are distinguished by distinctive patterns of short tandemly repeatedsequences within the U3 region. Type I LTRs typically have two tandemly repeated copiesof a 49 bp sequence designated a “Type I repeat”.. Type II LTRs.have one copyofthis Type Irepeat followed by a variable number of tandemly repeated copies of a 2 7-32 bp sequence,termed a “Type II repeat” (the representative Type II element shown in Figure 3-5 has fivecopies of this Type II repeat). The third RTVL-H subfamily consists of LTRs that do notcorrespond entirely to either Types I or II, having one copy of the Type I repeat, one copy ofthe Type II repeat, followed by sequences shared with Type I LTRs (Fig 3-5). This suggeststhat these LTRs represent a recombinant LTR subtype that most likely arose from arecombination event between Type I and II LTRs in the manner illustrated in Figure 3-5b.Because the sequence of this recombinant subtype is most similar to Type I, it wasoriginally termed Type Ia (Mager, 1989). The predicted reciprocal recombination producthas not been detected as yet.Copy numbers of the LTR subtypes in various primate species: To analyze thegenomic complexity of the different LTR subtypes, Southern transfers of EcoRI-digestedprimate genomic DNAS were sequentially probed with the three subtype probes (Figure 3-6). The Type IT-specific probe hybridized to multiple fragments in all seven Old Worldprimate species examined(Fig. 3-6c). The complexity of the banding pattern was similaracross these species although band intensity was somewhat greater in human andchimpanzee. For the Type I-specifIc probe, multiple bands were also seen in all seven OldWorld primates but the band intensity across the panel decreased more markedly thanwith the Type II probe (Fig. 3-Gb). Sequence divergence in the lower primates could be thecause of this intensity decrease which might be expected to be more pronounced when usingthe shorter Type I probe (85 bp versus 160 bp for the Type II probe). The hybridizationpattern observed with the Type Ia-specific probe was generally less complex suggestingthat this LTR subtype is present in fewer copies than Types I and II (Fig. 3-6d). Inaddition, only very few faint bands were seen in baboon and African green monkey (Fig. 3-Gd). This result could reflect sequence divergence or could indicate that Type Ta LTRs are11012345678Figure3-6.SouthernanalysisofprimateDNAsusingtheLTRsubtype-specificprobes.(A): EtdBr-stainedgelofapanelofEcoRI-digestedprimateDNAs.(B-D): AutoradiographsofaSoutherntransferofthegelin(A),sequentiallyhybridizedwithprobesforTypeI(B),Type11(C), TypeIa(D). Lane1,human; 2,chimpanzee;3, gorilla;4,gibbon;5,orangutan;6,baboon;7,Africangreenmonkey;8,marmoset.12345678ABCDkb20.09.4—5.8—4.4—23161.1—kb20.09.4—5.8—4.4—2.3—I.1—1234567812345678r I,‘4present in significant numbers only in the higher primates. No hybridization was seen inmarmoset, a New World monkey, under the conditions used, with any of the three LTRprobes. However, data presented earlier in this chapter did show that marmoset DNA doescontain some RTVL-H-related sequences.As another way of analyzing the LTR subtypes in different primate species, threegenomic libraries, representing human, chimpanzee and Old World formosa monkey, werescreened with the LTR subtype probes and with a larger LTR probe which detects allsubtypes. Copy numbers per haploid genome were estimated from the frequency of positiveclones and results are summarized in Table 3-1, All three species have similar totalnumbers of RTVL-.H elements as estimated by hybridization to the general LTR probe. Inaddition, each has a similar number of Type I and Type II LTRs, with Type I making upapproximately 50% of elements and Type II 30%. In human and chimpanzee, Type Taelements were estimated to be present in approximately 100 copies which is in accord withthe less complex pattern seen on Southern blots (Fig. 3-6d). However, there are very fewdetectable Type Ta LTRs in formosa monkey. Only two very weakly hybridizing phageplaques were detected in 5.7x104plaques screened, corresponding to a copy numberestimation of 7 Type Ta LTRs in the formosa monkey genome. The weak hybridizationmade this assignment uncertain however and the isolation and sequencing of these phageclones would be required to definitively determine the LTR subtype.The fraction of LTRs that are solitary in the genome (ie. not associated with internalRTVL.H sequences) was also determined by rehybridizing the same filters to internalprobes from the 5’ and 3’ portions of the element. This analysis showed that a significantpercentage (approximately 40%) of both Type I and Type II elements are represented assolitary LTRs, while Type Ta elements are represented mainly by full length elements (13%solitary). Solitary LTRs most probably arise through LTR-LTR recombinational eventsthat delete the internal sequences. The number of solitary LTRs might be expected toaccumulate with time and thus may be one indicator of the relative evolutionary ages of thesubfamilies.112Table3-1.CopyNumbersaofRTVL-HLTRSubtypesinDifferentSpeciesLTRTypeHUMANCHIMPANZEEFORMOSAMONKEYTOTAL240022002400TYPEI120011001100TYPEII750650750TYPEIa130120(5)baNumbersincludebothfulllengthelementsandsolitaryLTRs.bSomeofthesemayrepresentcross-hybridizationtotheTypeIprobe.Tb us, the lii) ra rv screening results also suggest that Type Ia LTRs are present. insignificant numbers only in the higher primates and represent a small and more recentlyamplified RTVL-H subfamily.Determination of integration times of individual RTVL-H elements: A PCR strategy,outlined in Figure 3-7, was subsequently used to directly determine when individual RTVLH elements with different LTR subtypes integrated and became fixed in the primatelineage. A similar strategy has been used by Shih et al., (1991) to investigate integrationtimes of other HERV elements. A number of RTVL-H clones, isolated from genomic andcDNA libraries, were available for which flanking sequence information was known for bothsides of the element (see Table 3-2). In these cases, primers that flanked the element wereused in PCRs to determine if (1) no element was present (Figure 3-7a), (2) the locuscontained a solitary LTR (Figure 3-7b), or (3) a full length element was present (Figure 3-7c). For the first two cases, locations of primers were chosen such that the expected PCRproduct was easily amplifiable, being in the range of 150-700 bp. However, if a full lengthelement was present, PCR using flanking primers would not yield an amplified productvisible on a gel as the expected size of such a product is >6kb. To confirm the presence of anelement in these cases, additional PCRs were carried out using primers within the LTR andpairing them with the appropriate flanking primer (see Figure 3-7c). In this way a positiveresult was obtained for each of the three possibilities with at least one l)rimer paircombination.In some cases, sequence information was available for only one side of the element.Here, only PCRs across the LTR- cellular junction could be performed (Fig. 3-7d). Theappearance of an amplified band indicated the presence of an element. A full lengthelement was distinguished from a solitary LTR by pairing the flanking primer with aprimer within the internal sequence of the element (see Figure 3-7d. One drawback inthese cases is that the situation of “no element” (Figure 3-7a) did not yield a band (ie. was anegative result). When using DNA from different species, this is a potential problem since114A1+I’llii..iiliiiiiii1kb+41+Bii.ii.IIi.ii..i4-4__3+C•ii..ailIIIi......4-2+4lal3a36+++D•••.....++22a44aFigure3-7.PCRstrategyforexaminingindividualRTVL-Hlociacrossprimatespecies.GenomicDNAisrepresentedbythicklines(solidlinesareRTVL-Hsequencesandbrokenlinesareunrelatedcellularsequences).LTRsareshownbythelargeopenboxes.Thesmallnumberedarrowsaboveandbelowthegenomicsequencesindicatethelocationsoftheprimersused.Whensequenceinformationwasavailableforbothsidesoftheintegrationsite,sizesofPCRproductsobtainedusingprimers1and4distinguishedbetween(A)noelementatthelocusand(B)asolitaryLTR.Whenprimers1and4didnotyieldanamplifiedfragment,thesuspectedpresenceofafulllengthelement(C)wasconfirmedbyusingprimers1and2andlor3and4.(D)Whensequenceinformationwasavailableforonlyonesideoftheintegrationsite, primers1/laand2/2aor3/3aand414aindicatedthepresenceofanelementwhileprimers1/laand5or6and4/4aindicatedthepresenceofafulllengthelement.Table3-2.RTVL-HElementsAnalyzedforTimeof IntegrationELEMENTLTRTYPESOURCEFLANKINGSEQ.REFERENCE__________________________________AVAILABLERTVL-H1TYPEIHumangenomiclibrary5’&3’Mager&Henthorn,1984RTVL-H3li“5’&3’Mager1989So12““5’&3’PB-3TYPEIIHumancDNA&genomiclibraries5’&3’cPj-LTR“Humanplacental cDNAlibrary5’Goodchildet.al.1992Po““5’&3’HpForl“Formosamonkeygenomiclibrary5’&3’unpublishedcH-4TYPEIaHumancDNA&genomiclibraries5’&3’Mager&Goodchild,1989laAlul“HumangenomicPCRclone5’thisstudylaAlu2““3’laAlu3“3’1 2 3 4 5 6 7 8 9101112131415161718PCR analysis of the RTVL-H3 integration site in primates. A representativeEtdBr-stained gel of PCR products obtained using the strategy outlines in Fig.3-7 is shown. Lanes 1-9, PCR with primers 1 and 4; 10-15, PCR with primers 1and 2; 16-18, PCR with primers 3 and 4. Lanes 1, 10, 16, no DNA negativecontrol; 2, 11, 17, human; 3, 12, chimpanzee; 4, 13, gorilla; 5, 14, 18, gibbon; 6,15, orangutan; 7, baboon; 8, African green monkey; 9, marmoset.kb1 . 6—1.1—0 . 5—0 . 2—Figure 3-8.117not observing the expected band could represent “no element” or could indicate thatsufficient sequence divergence had occurred such that the primers no longer bind andamplify efficiently. To try to reduce the likelihood of this problem, multiple flanking primersand LTR primers were used in different combinations in some cases.Table 3-2 lists the 11 elements that were examined: three Type I, four Type II andfour Type Ta. An example of the type of PCR results obtained is shown in Figure 3-8 for theType I element, RTVL-H3. PCR done using flanking primers yielded no bands for human,chimpanzee, gorilla, gibbon, and orangutan (lanes 2-6), suggesting the presence of a fulllength element, and a band of 220 bp corresponding to “no element” in baboon, Africangreen monkey, and marmoset (lanes 7-9). PCRs across the LTRlcellular junctions yieldedthe expected bands for an element present in human, chimpanzee, gorilla, gibbon, andorangutan (Figure 3-8, 200 bp, primers 1 and 2, lanes 11-13,15; 125 bp, primers 3 and 4,lane 17-18). The PCRs across the LTRJcellular junctions also yielded additional fainterbands. This was seen whenever the repetitive RTVL-H-specific primers were used and mostlikely represents spurious priming of other RTVL-H elements.PCR results for all ii elements are summarized in Table 3-3 and the approximateintegration times for these elements are shown graphically in Fig. 3-9. For only two of theii elements (RTVL-H3 and So12) were conclusive results obtained across the entire panel ofprimate species examined. For the remaining elements, results were not obtained for one ormore species, probably due to sequence divergence preventing efficient priming. Of theType I elements examined, one element (Sol2) integrated prior to the split between the apeand Old World monkey lineages, while a second (RTVL-H3) integrated into the ape genomeafter the split from Old World monkeys. For the third Type I element (RTVL-H1), resultsindicated a full length element present in the apes but no results were obtained for the OldWorld Monkeys. This may represent sequence divergence in the lower primates.Alternatively, since this element, located 3’ to the J3-globin gene cluster, is embedded withinan Li element (Mager and Henthorn, 1984), it may be that the Li sequence also is notpresent. To investigate this possibility, PCR was done with primers from outside the Li118Table3-3.SummaryofPCRAnalysisofiiRTVL-HLociinPrimatesTYPEITYPEiiTYPEIaSPECIESRTVL-SOL2RTVL-PB-3HpForlPocPj-LTRcH-4laAIullaAlu2laAlu3H3HiHUMANFSF’SFS+F+++CHIMPANZEEFSFSS+F+++GORILLAFSFSFF+F+++ORANGUTANFFF-F+-(-)(-)(-)GIBBONFSFFF+-(-)(-)C-)BABOON-F?F+-(-)(-)(-)O.W.MONKEY-FFF+?(-)(-)(-)MARMOSET--???(-)(-)(-)(-)F,fulllengthelement;S,solitaryLTR;-,noelement;+,fulllengthelementorsolitaryLTR;?,indicatesthatnodeterminationcouldbemade(5’and3’flankingprimersavailable);(-),indicatesthatthe“noelement”resultisbasedonnegativePCRdata(onlyoneflankingprimeravailable).lalala Ia HUMANI—MYr 50CHIMPANZEEGORILLAORANGUTANGIBBONOLD WORLDMONKEYNEW WORLDMONKEYFigure 3-9. Diagram illustrating the integration times of the individual RTVL-H elementspresented in Table 3-3. Elements are indicated by their LTR subtypedesignation only. Brackets indicate those elements for which complete resultswere not obtained what is indicated is the lowest branchpoints at which theseelements were determined to be present. The time scale is given below inMYr. Approximate species divergence times were taken from Gingerich (1984)and Sibley and Ahiquist (1987).I (I)I I I I40 30 20 10120sequence but the result was inconclusive (data not shown). Three of the four Type IIelements appear to have integrated prior to the ape/Old World monkey divergence. ForHpForl, results were not obtained for some of the primates examined, but because apositive result was obtained for representatives of both apes (human, gorilla, gibbon) andOld World monkeys (african green monkey) and because this element was in fact isolatedfrom formosa monkey, this element was present before the divergence. The presence of thiselement in marmoset could not be determined. The Po and cPj-LTR elements are present inall Old World primates examined and their presence in marmoset could not be determined.For cPj-LTR, an element for which sequence information was available for only one side ofthe element, I could not determine whether this element was full length in any of thespecies. PCR with internal primers (see Fig. 3-7d) gave multiple bands with no prominentband of the expected size (data not shown). The fourth element (PB-3) appears as a solitaryLTR in the great apes but there was no evidence for the corresponding full length elementin the lower primates. No results were obtained for any primate lower than orangutan, butbecause the orangutan result was “no element”, I have assumed that all lower primateswould also not have an element at this locus. Thus, it appears that this element integratedinto the genome after the great apes split from the lesser apes but underwent an LTR-LTRrecombination event giving a solitary LTR before the radiation of the great apes. Suchdeletion events appear to be a natural part of RTVL-H evolution, also being seen for theType II element Po and the Type I element So12.All four Type Ia elements appear to be present only in the great ape lineage,indicating integration in a common ancestor after the split from orangutan (17 MYr) butbefore the gorilla lineage diverged from the humanlchimpanzee lineage (9 MYr). For threeof these elements flanking sequence information was available for only one side of theelement so primers could not be used to amplify the unoccupied integration site in the lowerprimates. In such instances, the possibility that no amplification when using primer pairsas shown in Fig. 3-7d is due to sequence divergence rather than the absence of the elementcannot be totally ruled out. However, to reduce the likelihood of a sequence divergence121problem, several primer pair combinations were used in these cases and bands indicatingthe presence of an element were never observed in any of the lesser apes or Old Worldmonkeys. Furthermore, successful amplification of all othereight loci studied in eitherorangutan or gibbon or both (see Table 3-3) strongly suggests that the lack of amplificationin these species at the three Type Ta loci reflects an unoccupied integration site rather thansequence divergence preventing efficient PCR. In accord with this conclusion is the findingthat the average sequence divergence in non-coding DNA between human and orangutan isonly 3.4% (Goodman et al., 1989).Interestingly, although the cH-4 Type Ta element has integrated relatively recently,analysis of its internal region indicates that it shares the same pol and env deletions foundin most other RTVL-H sequences. We have also observed through library screenings that aprobe derived from a structurally intact element which is specific for a pol region deleted inmost RTVL-H elements (Wilkinson et al., 1993) does not hybridize to any clones positive forthe LTR Type Ta probe (data not shown). These findings suggest that most if not allelements with Type Ia LTRs are derived from a partially deleted sequence thatsubsequently amplified in the genome.RTVL-H associated polymorphisms: The identification of successive expansions ofRTVL-H elements during primate evolution raises the question as to whether this family isstill expanding within the human genome. I approached this question of ongoingtransposition in a number of ways, several of which will be discussed in detail in the nextchapter. Here I present my efforts to demonstrate new integrations by looking for RTVL-Hassociated polymorphisms in the human population. Panels of DNAS from unrelatedindividuals were probed with the various poi and LTR subset probes and examined forbanding differences that would suggest new integrations. Representative Southern blots, inwhich a panel of human DNAs, including DNAS from individuals representing threedifferent inbred populations (Amerindian, Melanesian, Pygmy), has been probed with theregion C poi probe and the Type Ta LTR probe, are shown in Fig. 3-10. Regions of banddifferences as well as differences in band intensities are seen between samples but122kb20.0-e4 .19.4--45.8I442.3-1.6-1.0-Figure 3-10. Representative Southern blots showing a small panel of human DNAS probedwith two RTVL-H subset probes. (A): BglII-digested DNAS probed with thepol region C-specific probe. (B): EcoRI-digested DNAS probed with the TypeTa LTR-specific probe. Lanes 1 and 2, DNA samples isolated from primarylymphocytes of two unrelated individuals; lanes 3, 4 and 5, DNA samplesisolated from cell lines derived from individuals representing three inbredpopulations, pygmy (lane 3), Melanesian (4), and Amerindian (5).A12345B$4.4‘1412345I—04$‘3123reproducibility was difficult and band differences seen with one restriction enzyme couldnot be identified using other restriction enzymes. Thus it was not possible to assign any ofthe differences observed to new integrations. An RTVL-H-associated polymorphism wasobserved using a PCR strategy to detect spliced, integrated RTVL-H elements. Thispolymorphism, discussed in detail in the next chapter, appears to represent an RTVL-Hprocessed pseudogene that has not yet become fixed in the population (see Fig. 4-ic, p. 132).Finally, genetic variation between individuals can also be the result of LTR-LTRrecombination and has been demonstrated previously (Mager and Goodchild, 1989).DISCUSSIONIn this chapter I have identified successive evolutionary expansions of the RTVL-Hfamily of human endogenous retroviral elements. It appears that RTVL-H sequences arederived from ancestral elements that were present in the primate lineage before thedivergence of New World monkeys from other primates. After this divergence 45 MYr ago,there occurred a dramatic expansion of deleted RT\\TLH elements to high copy number(800-900 copies) in a common ancestor of the Old World primate species. These elementswere of the Type I and Type II LTR subfamilies. A second expansion of deleted elementsassociated with a third LTR subtype, Type Ta, occurred more recently, this subfamilyhaving arisen in the primate lineage after the ape/Old World monkey divergence (30 MYrsago) and having expanded to moderate copy number (approximately 100) in the great apes.Other HERV families that have been examined have also arisen early in primateevolution but their copy numbers are all much lower (1-100) than RTVL-H. The Class IHERVs, including HERV-E (cited in Repaske et al, 1985), ERV1 (unpublished data cited inO’Connell et al, 1984), HERV-R (ERV-3) (O’Connell et al, 1984; Shih et al, 1991) RRHERVI (Kannan et al, 1991), RTVL-I (Shih et al, 1991), and HuERS-Pi and HuERS-P2 (Haradaet al, 1987) are all found in Old World monkeys, apes and humans, while HRES-i (Perl etal., 1989) and HuERS-P3 (Harada et al, 1987; Kroger and Horak, 1987) related sequenceshave also been reported in New World monkeys. Similarly, the Class II HERVs (HERV-K)124have been found in Old World monkeys, apes and humans (Mariani-Costantini et al, 1989).The THE-I family of sequences (Paulson et al, 1985) also appeared in the primate lineagevery early and subsequently expanded to approximately 10,000 copies in primates (Paulsonet al, 1985; Schmid et al, 1990)..Our analyses also indicate that a large fraction of RTVL-H elements exist as solitaryLTRs. This finding has also been made for THE-I (Paulson et al., 1985). and several HERVfamilies (Steele et aL, 1984; Lania et al., 1992; Leib-Mosch et al., in press) indicating thatrecombination between LTRs of an element has been a frequent occurrence in theevolutionary history of these types of sequences. For several individual RTVL-H elementsexamined in this study, I was able to follow the conversion of full length elements tosolitary LTRs. This type of recombinational event is an ongoing process since we havepreviously reported an LTR-LTR recombination involving the cH-4 element in two siblingsnot found in a panel of 70 other unrelated individuals (Mager and Goodchild, 1989). Suchrearrangements can thus be a source of genetic variability in a population. They can alsoresult in the reversion of a phenotype that is due to the presence of a full length element.For example, an LTR-LTR recombination event which deletes most of an HERV-E elementlocated at an amylase gene locus results in pancreatic expression of the gene while thethree duplicated amylase genes associated with an intact HERV-E are expressed in theparotid salivary gland due to enhancers located within the HERV-E element (Samuelson etal., 1990; Ting et al., 1992).The reason for the dramatic expansion of deleted RTVL-H elements amongcatarrhine species and the more recent expansion of the Type Ia subfamily in the greatapes is not known. We have observed that deleted elements are expressed at higher levelsand in a wider range of cell types than RTVL-Hp elements (Wilkinson, 1993; Wilkinson etal., 1993), suggesting that additional mutations, probably in the LTR, may have enhancedthe transcription and consequentially the transposition rate of one or more precursordeleted elements. Since the LTRs of RTVL-Hp elements have not yet been characterized,the exact nature of any such mutations remain to be determined. Our laboratory has also125found that the Type Ta subfamily is expressed in the widest range of cell lines (Wilkinson,1993). Northern analyses of polyadenylated RNAs from various human cell lines with thethree LTR subtype specific probes indicate that in general, Types I and II show morerestricted expression than Type Ta (Wilkinson, 1993). Unit-length transcripts of 5.6 kb fromTypes I and II were limited to HeLa cells and one or more of the teratocarcinoma cell lines(Tera-1, Tera-2, NTera2Dl). The less abundant Type Ta LTRs were the most widelyexpressed, with full length 5.6 kb transcripts present in all the teratocarcinoma cell lines,HeLa, 5637, and T24 cells. This was not unexpected as the LTR subtypes differ in the U3region of the LTR which typically contains the regulatory sequences for transcription(Temin, 1982). The structure of Type Ta LTRs suggests that the initial sequence of this typearose from a recombination event between a Type I and a Type II LTR (see Fig. 3-5). Thisrecombination event could have facilitated the expansion of this subfamily, possibly due tothe novel combination of repeat sequences in the U3 region of the Type Ia LTR allowingassociated elements to be expressed at higher levels. Consistent with this hypothesis is thefact that, compared to Types I and II, Type Ta LTRs are generally the strongesttranscriptional promoters in transient assays (Feuchter and Mager, 1990; unpublishedobservations).Deleted elements are very common among families of endogenous retroviruses andretrotransposons (Boeke, 1989; Fredholm et al., 1991; Kuff and Lueders, 1988; Reeves andO’Brien, 1984). Two mechanisms have been proposed to explain these deletions. First, shortdirect repeats have been observed at the junctions of many deletions, suggesting thathomologous recombination between the repeats may have resulted in the removal ofintervening sequences. Alternatively, findings among infectious retroviruses indicate thatthe process of reverse transcription is prone to creating deletions (reviewed in Varmus andBrown, 1989). Among the three pol deletions in RTVL-H elements described here, onlydeletion A has clearly recognizable short direct repeats (Wilkinson et al., 1993). It wouldseem, therefore, that deletions B and C may have been generated during the reversetranscription of a precursor RTVL-Hp element. Deletion breakpoints are the same in126different elements, suggesting that each deletion occurred only once, with subsequentexpansion of that deleted element. However, it is also possible that gene conversion hasserved to homogenize the RTVL-H population, thereby obscuring multiple deletion events.The two major expansions of RTVL-H elements were identified in this study throughSouthern analysis and library screening protocols. Evolutionary ages of retroelementsubfamilies can also be estimated from the extent of sequence divergence between elements(Britten et al., 1988; Jurka and Smith, 1988; Jurka, 1989; Shen et al., 1991). Withinsubfamilies, RTVL-H LTRs show a high degree of homogeneity. Pairwise comparisonsamong 6 unlinked Type I LTRs give a % identity ranging from 87.2% to 93.3%, with nonediffering by no more than 6% from a Type I consensus sequence (Mager, 1989). A similarrange of values is obtained from pairwise comparisons of five unlinked Type II LTRs (83.3to 9 1.6% identity) (data not shown) suggesting that these two subfamilies are of similarevolutionary age. Using the intervening sequence nucleotide mutation rate of 0.15% perMYr (Miyamoto et al., 1987), the ages of the Type I and Type II LTR subfamilies can beestimated at 40 MYr and 55 MYr respectively. This estimate suggests that the Type Isubfamily arose before the ape/Old World monkey divergence (30 MYr) but after the splitfrom the New World lineage (45 MYr), and that Type II elements possibly arose beforethis divergence. The LTR subtype of the RTVL-H-related sequences detected in marmosetDNA has not been determined. Oniy two unlinked Type Ta LTR sequences have beenisolated and sequenced. These LTRs are 95.6% identical suggesting that this subfamily isyounger than Types I and II. Sequence differences between the 5’ and 3’ LTRs of a givenelement can be used to estimate the time of integration of that element, assuming that thetwo LTRs are identical at time of integration and that the LTRs accumulate mutations atthe same rate as genomic DNA. We had sequence information for four LTR pairs (two TypeI, one Type II and one Type Ta) that were also examined by PCR for time of integration.The 5’ and 3’ LTRs of the RTVL-H1(Type I), RTVL-H2 (I), HpForl (II) and cH-4 (Ia)elements are 96.1, 95.7, 95.8 and 98.4% identical. Using the divergence rate of 0.15% perMYr, this data suggest that the two Type I and the Type II elements integrated 12-14 MYr127ago, before the radiation of the great apes, while the Type Ta element integrated 5 MYr ago,after the human-chimpanzee divergence. In all four cases, the integration time estimatedfrom the % sequence divergence suggested a much more recent insertion into the primatelineage than indicated by the PCR analysis (see Fig. 3-9). Integration times determined byPCR are the more accurate as we are directly examining the different primate species forthe presence of an element at a given locus. Furthermore, the PCR results obtained weregenerally consistent with the evolutionary ages of the subfamilies as estimated by sequencecomparisons between unlinked LTRs and by hybridization analyses. Reasons for thisdiscrepancy are not known. However, sequence divergence may not always be a reliablemeasure of evolutionary age. Gene conversion mechanisms may act to homogenizemembers of a family within a species, as has been suggested for Li sequences (Hutchison etal., 1989). This would result in an underestimate of evolutionary age. Furthermore,integrated sequences may not accumulate mutations at a rate equivalent to that applicableto genomic sequences (Maeda and Kim, 1990; Aota et al., 1987). Maeda and Kim (1990), inan examination of the integration times of two RTVL-I elements associated with thehaptoglobin gene cluster, suggested that foreign DNA, such as newly integrated viralgenomes, might initially accumulate mutations faster than the surrounding genomic DNA.Supporting this suggestion is the report that the mutation rate for mouse endogenousretroviral TAP elements is 6-lOX faster than for genomic DNA (Aota et al, 1987). Anotherpotential problem which could complicate analysis is that the 5’ and 3’ LTRs may notnecessarily be identical at the time of integration. (Hawley et al., 1984).In summary, this study shows that the RTVL-H family of elements, like most otherHERV families, has been present in the germ line for much of primate evolutionary history.However, in contrast to other HERV families which have remained at low copy numbers,the expansion of the RTVL-H family to approximately 1000 copies and the relatively recentexpansion of elements with Type Ta LTRs to 100 copies is quite dramatic. These expansionsindicate that contemporary retrotranspositions of RTVL-H elements, especially those withType Ta LTRs, may be more probable than for other types of HERV sequences.128Furthermore, the fact that these expansions have involved elements renderedtranslationally defective by shared deletions suggests that their retrotransposition wasfacilitated by factors provided in trans. Whether these factors are produced by rare codingcompetent RTVL-Hp elements (Wilkinson et al., 1993) or by a heterologous source remainsan unanswered question.129CHAPTER IVRTVL-H RETROTRANSPOSITIONINTRODUCTIONThe work described in this chapter was done to test the hypothesis that RTVL-Helements are bona fide human retrotransposons. Their retrovirus-like structure, thepresence of target site duplications, and the high copy number and dispersion of thesesequences within the genome suggest that RTVL-H did expand in the genome throughretrotransposition. However, these organizational characteristics do not by themselves ruleout other mechanisms for genomic amplification such as gene conversion or duplication.The long-term goal of this work was to definitively demonstrate that RTVL-H elementsarose via retrotransposition and to determine if this is an ongoing process. Two basicapproaches were undertaken. The first involved a strategy for isolating previouslyretrotransposed elements. We have detected the expression in a variety of cell lines ofspliced 3.7 kb RTVL-H transcripts in which the same splice donor (SD) site and one of acluster of splice acceptor (SA) sites are utilized (Wilkinson et al., 1990). The finding ofintegrated elements derived from these spliced transcripts would be strong, evidence thatthey had transposed via an RNA intermediate. A similar strategy has been used todemonstrate copia retrotransposition (Yoshioka et al., 1991).The second approach involved the development of several strategies designed toobserve RTVL-H transposition within an experimental time frame. The first strategyinvolved selecting for inactivation of a target gene and then examining that gene locus forrearrangements due to new integrations. Insertional mutagenesis has been documented formany retrotransposon families and was often how such sequences were first identified asbeing transposable elements (reviewed in Berg and Howe, 1989). A more direct strategywas also designed which allows for the selection of a retrotransposition event by a markedRTVL-H element. The key feature of this strategy is the “retrotransposition indicator gene”,a selectable marker gene, in this case neor, that has been interrupted by an antisenseintron and inserted within an RTVL-H element. A functional neo gene can only be restoredif the marked RTVL-H element is transcribed, spliced, reverse transcribed and integratedback into the genome (see Fig. 4-7). Similar strategies have been used by others to131demonstrate the retrotransposition of Ty elements in yeast (Curcio and Garfinkel, 1991), ofTAP elements in mice (Heidmann and Heidmann, 1991), and of defective retroviruses(Tchenio and Heidmann, 1991).Here I provide evidence for the existence of an integrated element derived from aspliced RTVL-H transcript as a result of RTVL-H retrotransposition. In determiningwhether RTVL-H retrotransposition is an ongoing process, I present data from preliminaryexperiments showing the usefulness of the RT indicator cassette in selecting forretrotransposition. I also present technical difficulties encountered during attempts todemonstrate RTVL-H retrotransposition in an experimental time frame. I have been aidedin this work by Doug Freeman who constructed the neo-int cassette and the JZEN/neo-intand RTVL-H/neo-int RT vectors.RESULTSIsolation ofpreviously retrotransposed elements: The PCR strategy used to isolatespliced retrotransposed elements is outlined in Figure 4-la. Primers were designed that lieupstream of the SD site and downstream of the cluster of SA sites. PCR analysis ongenomic DNA would be expected to yield a —2 kb amplified product from intact elements.The presence of smaller products may represent amplification either from variousinternally deleted elements or from spliced retrotransposed elements (expected size rangewould be 130 bp to 310 bp depending upon which of the SA sites was used). One such PCRexperiment, done using as templates a panel of human and primate DNAs, is shown inFigure 4-lb. The banding patterns obtained were relatively simple, with those forchimpanzee and gorilla being similar to that obtained for human, while the lesser apes andOld World monkeys show different banding patterns. The expected product from an intactelement is not visible on the EtdBr-stained gel due to the short extension times used duringthe PCRs. Most lanes have a band of —210 bp which is in the size range expected for aspliced retrotransposed element. However, it is interesting that in the human lanes only asingle band within this size range is seen, despite the previous identification of the use of132Figure 4-1. The detection of spliced genomic RTVL-H elements. (A): PCR strategy used todetect spliced genomic elements. Shown is a full length RTVL-H element (top)and a theoretical spliced copy (below). The open boxes represent the LTRs.The splice donor site (SD) and the cluster of splice acceptor sites (SA) areindicated. The arrows represent the positions of the primers used.(B): EtdBr-stained gel of the PCR products obtained using the strategy shownin (A) and using as templates a panel of human and primate DNAs. Lane 1, noDNA control; 2-6, human (4-6 represent inbred populations: Amerindian,Melanesian, Pygmy); 7, chimpanzee; 8, gorilla; 9, gibbon; 10, orangutan; 11,baboon; 12, African green monkey; 13, marmoset. (C): EtdBr-stained gel ofPCR products obtained from a larger panel of human DNAs using the strategyshown in (A). The apparent polymorphic —400 bp band is indicated.133C)w00--CJ1(DC1..000F)0100)0-IIpIIIICl) 00•r%) 01 0)ICD 0 C)Elmultiple SA sites. Of special note was the apparent polymorphic 400 bp band seen in thehuman DNAs tested. A larger panel of unrelated human DNAs is shown in Figure 4-ic andshows 6 of the 12 DNAs tested yielding this band, suggesting that the correspondingelement is widespread in the human population but is not yet fixed.Thirteen PCR products (four from human, five from gibbon and four fromorangutan) were subcloned and sequenced, and the sequences obtained were compared tothat of an intact RTVL-H element, RTVL-H2 (Mager and Freeman, 1987). The polymorphicband in human, three gibbon products and one orangutan product may be derived fromspliced retrotransposed elements (see Figure 4-2). The sequence of the human fragmentextends from the 5’ primer to the SD site, where the homology jumps 1673 bp to a site justupstream of the cluster of previously identified SA sites. The gibbon and orangutan clonesalso extend from the 5’ primer to the SD site where homology jumps to the region of the SAsites. For two of the gibbon clones and the orangutan clone, homology resumes atpreviously identified SA sites, with one gibbon clone and the orangutan clone using thesame SA site. There are some sequence differences between these latter two clones butthese could represent sequence divergence from a common ancestral splicedretrotransposed element in the gibbonlorangutan branch. An examination of the RTVL-H2sequences corresponding to the SA sites identified in the PCR clones showed a good matchto the consensus SA sequence in only two of the five cases (HUM-Si and GIB-S2, see Fig. 4-2). However, RTVL-H2 is only one of a 1000 RTVL-H elements and it is possible that theparticular elements from which these spliced sequences were derived show a better matchto the consensus at these sites.The —210 bp and 600 bp fragments present in human were also sequenced. The 600bp clone extended from the 5’ primer through the SD site, where homology jumps from apoint 326 bp further downstream to resume at a position within the cluster of SA sites. The210 bp clone also extends beyond the SD site for 149 bp and then jumps to a position justdownstream of the SA sites. Thus, the 210 bp and 600 bp fragments do not appear to bederived from spliced re-integrated elements as sequence loss does not coincide with the SD135CONSENSUS SD: AGTAAGTA560 SD 600RTVL—H2 CAGCCCAAGGAACATCTCACCATTTTAAATCGGTAAGCG...HUM-Si CCCAAGGACATCTCACCAATTTCAAATCCG IB- Si CCCAAGGAACATCTCACCAATTTTAAATTAGIB—S2 CCCAAGGAACATCTCACCAATTTCAAATCCG IB— S3 CCCAAGGACATCTCACCAATTTTAATCGORANG- Si CCCAAGGAACATCTCACCAATTTTAAATTGB CONSENSUS SA: ()NAG2250 S 2275RTVL-H2 . . . CCACTGTGAGAAACCCCACCACGTCTCC...HUM—Si IC—ACATCTCC...2360 SA 2385RTVL-H2 ... GCCAGAAATCTGGCCACTGGbCCAAGGAATG...GIB-S3 IGCCAAGGAATG...2420 SA* 2440RTVL-H2 . . . GCCATGTCCCATCTGTGTGcGACCCCACTGG...GIB—Si GAGCCCACTGG...ORANG-Si GACCCCACTGG...*2510 SA 2530RTVL—H2 ... GCTCTCTGACTCCTTCCCAGTCTTCTTGG...GIB—S2 TCTTCTCGG...Figure 4-2. Sequence alignments showing (A) the 5’ splice junctions and (B) the 3’ splicejunctions of the fragments generated using the PCR strategy outlined in Fig.4-lA. The fragments are aligned against the sequence of RTVL-H2 (Mager andFreeman, 1987). The consensus SD and SA sequences are also shown. Theasterisks indicate previously identified SA sites (Wilkinson et al., 1990).136site. It is possible that other SD sites exist, though in previous work only this SD site wasidentified (Wilkinson et al., 1990). On the other hand, the existence of various deletedelements has been suggested by Northern analysis and from the analysis of cDNA clones(Wilkinson et al., 1990).The 210 bp clone seen in humans also appears to be present in gibbon. A gibbon 210bp clone had the same deletion breakpoints as the human 210 bp clone and also had thesame 30 bp insert relative to RTVL-H2. This would suggest that this deleted element waspresent in an ancestor to gibbons and humans.These results suggest that putative spliced RTVL-H elements are present in thegenome of humans and other primates and could represent the reverse transcription andre-integration (ie. the retrotransposition) of processed RTVL-H transcripts. To prove thatthese fragments are indeed derived from such spliced retrotransposed elements, I examinedeach element for the presence of an intact 5’ LTR using the PCR strategy outlined in Figure4-3. RTVL-H transcription begins at the 5’ boundary of the 5’ LTR R region and terminatesat the 3’ boundary of the 3’ LTR R region with the addition of a polyA tail (Wilkinson et al.,1990). Thus, an RTVL-H transcript does not have intact 5’ and 3’ LTRs (Fig. 4-3). Duringthe process of retrotransposition, reverse transcription occurs in a very defined way whichresults in the LTRs being regenerated (see Introduction, section 1.1.2, p. 46; Fig. 4-3,pathway A). Cellular transcripts can also be reversed transcribed nonspecifically;reintegration of such reverse transcribed sequences gives rise to processed pseudogenes(Weiner et al., 1986). RTVL-H transcripts may also go through this non-specific reversetranscription and reintegration, giving rise to processed RTVL-H pseudogenes that havethe structure of RTVL-H transcripts, ie. incomplete LTRs and a 3’ polyA tail (Fig. 4-3,pathway B). The PCR strategy helps to distinguish between a true retrotransposed splicedelement and an RTVL—H processed pseudogene based on the fact that only aretrotransposed element will have intact LTRs. Pairing a primer that spans the splicejunction with either a U3- or a U5- specific primer will yield an amplified fragment of theappropriate size if the element has intact LTRs (Fig. 4-3, pathway A) while only a137_______so SA_______----1 U3 RI U51 liii U31 R U51 - - -Transcription andprocessing______Jxn______[RIU5I I IU3IRIAAAAReverse transcriptionand integrationby one of two pathwaysA. RetrotranspositionB. Processed pseudogene formationJxn______RU5 U3 R AAAA—-DIAGNOSTIC PCR AMPLIFICATIONSPRIMERS PATHWAY A PATHWAY BU3/Jxn yes noU5/Jxn yes yesFigure 4-3. PCR strategy designed to distinguish between (A) a retrotransposed RTVL-Helement and (B) a processed RTVL-H pseudogene. Solid lines represent RTVLH sequences; broken lines represent unrelated cellular DNA. Open boxesrepresent the LTRs; the U3, R and U5 regions are indicated. The large arrowheads indicates a target site duplication. The small arrows indicate primersused in PCR. The expected PCR results for each scenario are also given. Otherabbreviations: SD, splice donor site; SA, splice acceptor sites; Jxn, splicejunction.138U5/junction primer pair will yield an amplified product if the element is a processedpseudogene (Fig. 4-3, pathway B). This PCR strategy was used to examine the structure ofthe element from which the apparently polymorphic PCR product in human was derived.The human DNA from GM10540 (Melanesian), previously identified as having this PCRproduct using the original SD/SA primer pair (see Fig. 4-1), was compared with DNA fromGM10968 (Amerindian) which had not yielded this PCR product. PCRs with a U3/splicejunction andlor a U5/splice junction primer pair(s) should yield a unique PCR band of -600bp andlor -180 bp respectively (based upon the sequence of RTVL-H2; Mager and Freeman,1987) with the GM10540 DNA as template. However, multiple bands were obtained withboth DNAs and no unique band of the expected size was seen using either primer pair withthe GM10540 DNA, even when PCRs were performed using an annealing temperatureequal to the calculated Tm of the splice junction primer [Tm = 4°(C+G) + 2°(A+T)1. T’.vojunction primers were tried, one that spanned the junction 10 bp on each side, the otherthat spanned the junction 5/15 bp. These were paired with two different U3 primers. Allprimer pair combinations gave inconclusive results. The PCRs were being set up at roomtemperature and I reasoned that at this low temperature, the junction primer may bebinding nonspecifically to intact elements. Extension from these inappropriately primedelements, before the tubes were put into the thermocycler, could obscure the expecteddifference between the 0M10540 and GM10968 DNAS. A “hot start PCR’ (D’Aquila et al.,1991), in which the Taq polymerase was not added to the reactions until during the first95oc denaturation step, was tried and yielded more clear results. The expected amplifiedproduct using the U5/splice junction primer pair was seen with 0M10540 DNA and notGM10968 DNA. Neither DNAs yielded the expected product with the U3/splice junctionprimer pair (Fig. 4-4). That a unique amplified product could be obtained only with theU5/junction primer pair suggests that this element represents an RTVL-H processedpseudogene. However, it is also possible that the U3 region of this particular element issufficiently diverged from the sequences used to design the U3 primers that the primers didnot bind. For the three gibbon clones, the expected amplified band was also obtained only139bp1000-500-200-U3/SJ U5/SJ12 3 12 3Figure 4-4. Analysis of the putative spliced RTVL-H element identified in certain humanDNAs. Shown are the results of the PCR strategy outlined in Fig. 4-3, theEtdBr-stained gel of the PCR products and a Southern blot of that gelhybridized to the 5’ internal RTVL-H probe. PCRs using the U3/splice junctionand U5/splice junction primer pairs are indicated. Lane 1, no DNA control; 2,GM10968 (Amerindian); 3, GM10540 (Melanesian). The expected amplifiedproduct using the US/splice junction primer pair is indicated by the arrow.12 312 3140A123 bp-1000-500-200Figure 4-5. Analysis of the putative spliced RTVL-H element identified in orangutan DNA.(A): Shown is the EtdBr-stained gel of the products obtained using the PCRstrategy outlined in Fig. 4-3 and orangutan genomic DNA as template. Lane 1,U3-l/Jxn primers; 2, U3-2/Jxn primers; 3, U5/Jxn primers. The expectedproducts for the two different U3 primers each paired with the splice junctionprimer and for the U5/splice junction primer pair are indicated by the arrows.(B): Sequence comparison of the orangutan 5’ LTR with the Type II LTR ofPB-3 (Máger, 1989). The 31 bp gap in the orangutan LTR is due to this LTRhaving one less Type II repeat unit than the PB-3 LTR which has five Type IIrepeats. Note that the first 22 bp of the ORANG-Si LTR sequence (overlined)are from the U3-specific primer used in the PCR and thus may not be identicalto the actual orangutan clone.-(C): Sequence comparison of the orangutan 5’ internal sequence with that ofRTVL-H2 (Mager and Freeman, 1987). The position of the LTR is indicated.The PBS is underlinedloverlined.141BORANG-S 1 1 TGTCAGGCCTCTGAGCCCAAGCTAACCCATCGTANNCCCAGTGACCTQCA 501WPB -3 I TCTCAGGCCTCTGAGCCCMGCCG . CATCCCATCCCCTGTGATTTGCA 4951 CGTATACATCCAGATGGCCTGAAGCAACTGAAGAT . CACGAAACAAGTGA 9950 CGTATACATCCAGATGGCCTGTCTGAGATCCAC GAAGTM.. 99100 AAATAGCCTTAACTGATGACArrCCACCATTGTGATTTATTTCTNNCCCA 149liii 11111111 1111111 III I I I I:: I I100 AA1TAGCCTTACTGATQACATTCCACCATTGTGATTTGTTCCTGCCCCA 149150 ACCTAACTGATCAAT GTAC 168I II 1(111111 I I150 CCCTACTGATCAATGTACTTTCTAATCTCCCCCACCCTTAAGAAGGTAC 199169 TTTGTAGTCNNNCCNCCCAGAAGrTCrrTGTAATTCTTCCCACCC 218200 TTTGTAATCTTCCCCACCCTTAGAGGVrCTTTGT?.ATTCTCCCCACCC 249219 TTCAGAATGTACTTTGTQAGATCCACCCCCTGCCCCCAAPACATTGCTCT 268II II I II250 TTGAGATGTACTTTGTGAGATCCA. CCCTGCCCACA?ACATTGCTCT 297269 TCTCCACTCCCTATCCCAAPACCTCTGAACTAATAATGATATCCA 318298 TCTTCACCGCCTACCCCCTATGC. .TAATGATTCCA 344319 CCACCCTTTCCTGACTCTC’TTCGGACTCAGCCCCCCTGCACCTAGGTG 368345 TCACCCTTCCCTGACTCTCTTTTcGGACTCACCCCACCTGCACCCAGGTG 394369 AAATAAACAGCTGTGTTGCTCACACAAACC’GTTTGGTAGTCTCTTCAC 418395 AAATACAGCTTTATTGCTCACACAAACCCTGTrrGGTGGTCTCTTCAC 444419 ACGGACACATAAGACA 434II I I I445 ATGGATGCACATGGAA 460C________________________________444 454 464‘rrTGGTGCCGAAGACCCGGCTCAGCAGGACRTV L- H 2_______TTTGGTGCTG—TGACTCAGATCGGGGGACC460 470474 484 494 504 514TCCTTTGCGAGACCAGTCCACTGTCCTCACccTAcCTCCGTG -AAGAGATCCACCTACGATCCCTTGGGAGATCTCCCCTCTCCTGTTCTTTGCTCCGTGAAAAAGATCCATCTATGA480 490 500 510 520 530524 534 544 554 564 574CCTCGGGTCCTCAGACCAACCAGCCCAAGGAACATCTCACCGATTTTAAATTGGIIICCTTAGGTCTTCAGACCCACCAGCCCAAGGAACATCTCACCAATTTTAAATCGGGTAAGC540 550 560 570 580 590142with the U5/junction primer pair (data not shown). Only with the orangutan clone were theexpected amplified products obtained with both U3/junction and U5/junction primercombinations (Fig. 4-5a). Sequencing of these PCR fragments revealed an intact 5’ LTRsuggesting that this orangutan clone does indeed represent a spliced retrotransposedRTVL-H element. Sequence comparisons of the orangutan 5’ LTR with other RTVL-H LTRsindicated that the orangutan LTR is Type II (Fig. 4-5b). Comparing the 5’ internalsequence of the orangutan clone with that of RTVL-H2 (Mager and Freeman, 1987)revealed that the PBS region of this element is diverged, being a closer match forphenylalanine tRNA than for histidine tRNA (Fig. 4-5c). It may be that this divergence isunique to this element. However, this is the only Type TI-associated PBS that has beensequenced to date and it is possible that this RTVL-H subfamily is associated with adifferent tRNA PBS. Further proof that this orangutan clone represents a splicedretrotransposed element would be to isolate the genomic locus and to look for the presenceof a target site duplication of 5 bp which is typical of RTVL-H elements. The target siteduplication resulting from the nonspecific integration of a processed pseudogene would bevariable in length (Tchenio et al., 1993).Detection of transposition through target gene disruption: One strategy used toattempt to identify de novo RT\\TL-H transpositions was to screen for the disruption of atarget gene. The HPRT (hypoxanthine guanine phosphoribosyl transferase) gene waschosen as the target gene because (1) loss of HPRT activity can be selected for in thepresence of 6-thioguanine (6-TG), (2) it is on the X chromosome and thus will be present inonly one copy in male derived cell lines, and (3) it is a large genomic locus, having 9 exonswithin a 44 kb expanse of genomic DNA (Stout and Caskey, 1985). Two cell lines wereused, the teratocarcinoma cell line NTera2Dl and the bladder carcinoma cell line 5637.Both are male derived and both express full length RTVL-H transcripts (Wilkinson et al.,1990). Also, NTera2Dl cells have been shown to have reverse transcriptase activityassociated with a macromolecular complex having the characteristics of a viral-like particle(Deragon et al., 1990). Experiments were done selecting for spontaneous HPRT mutants14320.0-9.4-5.8-4.4-2.3-1.6-N 1234 5 6 7 8 9 10 11Figure 4-6. Representative Southern blot of HPRT mutant clones. Shown is a panel of iispontaneous NTera2Dl HPRT mutant DNAs digested with EcoRI and probedwith the HPRT cDNA. The lane marked ‘N’ contains a normal NTera2D 1 DNAsample.S144by culturing cells in the presence of 6-TG. It was recognized that most HPRT mutationswould be point mutations or small deletions. I was looking for gross rearrangements viaSouthern analysis that may represent de novo insertions. In total, 49 NTera2Dl and five5637 HPRT- mutants were examined by Southern analysis and a representative Southernblot is shown in Fig. 4-6. No banding differences were seen with any of the mutants.Backselection of several clones in HAT medium did not lead to extensive cell death,suggesting that these clones were not true HPRT mutants.Detection of transposition through the use ofa “retrotransposition (RT) indicatorgene’ The main drawback with using target gene inactivation is that it is not possible toselect for insertional mutagenesis. Most mutations will be point mutations and smalldeletions. The use of an RT indicator gene allows for the selection of a retrotranspositionevent. The strategy I have employed is outlined in Fig. 4-7a. The neor gene has beeninterrupted by an intron and flanked on its antisense strand by SD and SA sites. If this“neo-int cassette is inserted into a retrotransposon in the reverse orientation, the intronwill be removed upon transcription of the retrotransposon and the splicing of the RNAtranscript. The reverse transcription of this transcript should generate a DNA structurewith a restored neo gene and a cell containing a retrotransposed copy should thus becomeresistant to the neomycin analogue G418.This RT indicator gene is similar to one devised by Heidmann et al. (1988) exceptthat in the latter case, the neor indicator gene was separated from its promoter by anantisense intron containing sense strand polyadenylation signals (Fig. 4-7b). Unless theintron is removed, transcripts will be prematurely polyadenylated and no gene product willbe produced. One disadvantage of this latter design is that deletion of the polyadenylationsignals through mechanisms other than splicing can occasionally restore functionalexpression of the marker gene (Heidmann et al., 1988). In fact, only two of the 37 G418-resistant events analyzed had a correctly generated splice junction (Heidmann et al., 1988).Such an indicator gene is therefore not optimal if the rate of retrotransposition is very lowas is predicted for RTVL-H elements. For these instances, it is advantageous to have the145ApromoterSD SA / rk- / neo nonfunctionalneo dueto antisenseintrontranscriptionfrom LTR_________ AAAAreverser transcription_____neor expression____________neo_____- restored byretrotranspositionneorBpromoterSpS/neoII---VSpoly ASDsignalSA\\ / (neo)’THeidmann et al.,1988premature polyadenylation unless intron removedduring retrotranspositionFigure 4-7. (A): Strategy to detect new retrotranspositions through the use of aretrotransposition (RT) indicator gene cassette. Symbols: thick lines, genomicDNA (solid, RTVL-H sequences; broken, unrelated cellular DNA); solid boxes,LTRs; open boxes, neo’ sequences; diagonally striped box, antisense intron;stippled box, TK promoter for neor; thin dashed line, RTVL-Hlneo-inttranscript. Primers used for PCR analysis are indicated by the small arrows.(B): Comparison between my neo-int RT indicator gene cassette and (neo)RTof Heidmann et al. (1988). The main difference is that in (neo)RT, theantisense intron is not within the neo’ coding sequence but separates the neo’gene from its promoter. The intron contains a polyA signal that results inpremature polyadenylation of the TK-neo’ transcript. Symbols: as in (A).146intron within the coding region as we have done, to minimize “background” gene activationdue to non-specific deletion of intronic sequences. This strategy was also employed byCurcio and Garfinkel (1991) in a study of Ty retrotransposition. In their RT indicator geneconstruct, they interrupted the coding sequence of the yeast HIS3 gene with an antisenseintron. In contrast to Heidmann et al. (1988), all 43 His+ events analyzed indicated correctsplicing (Curcio and Garfinkel, 1991). A retrotransposition indicator sequence, in which theneor coding sequence has been disrupted by a functional intron, has also been constructedby Schwartz et al. (1993). Interestingly and in contrast to my vector and others, the intronwas inserted in the sense orientation so that the intron can be spliced out of the transcriptoriginating from the neor promoter. However, Schwartz et al. (1993) have included a step intheir assay in which genomic DNA from the transfected mammalian cells is introduced intobacteria. Only the splicing of retroviral mRNA and its subsequent re-insertion into thegenome will result in a spliced genomic copy of the neo’ gene that will be active in bacteria.Testing the neo-int cassette: As an initial test of the neo-int cassette, it was inserted intothe JZEN retrovirus vector (Johnson et al., 1989) and then cotransfected, along with aplasmid conferring resistance to hygromycin, into the mouse packaging cell line GP+E86.The cells were grown in hygromycin and 12 hygromycin resistant colonies were picked forfurther analysis. DNA was isolated from each colony and PCR performed using primersthat flank the intron (see Fig. 4-7a) to determine if they had incorporated a JZEN/neo-intconstruct. The expected size amplified fragment of 1.2 kb was obtained from 10 of the 12colonies indicating that one or more copies of the JZEN/neo-int construct had integrated. Inaddition, 5 of these 10 DNA samples also had a fragment of 260 bp which is the sizepredicted for a spliced copy. This suggests that some of the cells from the 5 colonies hadacquired a spliced integrated form. The 10 positive clones were also each grown to —2 x 106cells and then placed in G418 to select for neo expressing cells. For the 5 clones that hadPCR evidence of spliced DNA copies, the dishes became confluent indicating that asignificant number of neoT expressing cells existed in the culture. G418-resistant colonieswere also obtained from 2 of the remaining 5 clones. PCR generated fragments from 3147independent G418r clones were sequenced and all had the predicted sequence shown in Fig.2-la (Chapter II, p. 87) indicating correct splicing of the intron. These results indicate thatneo gene expression has been restored by splicing, reverse transcription and reintegrationin some cells from a high proportion (7/10) of the JZEN/neo-int containing clones. Thespliced integrated copies could have resulted from intracellular retrotransposition since thishas been shown to occur for defective retroviruses (Heidmann et al., 1988). However, sinceretroviral RNAs can be packaged within the GP+E86 cells, some of the functional neogenes could have been acquired through infection by a virion harbouring a spliced RNA. Ineither event, the results demonstrate the utility of the neo-int cassette in selecting for cellsthat have integrated a spliced neor gene. In a parallel experiment in which the neo-intcassette was inserted into JZEN in the reverse orientation, no PCR evidence for splicedDNA copies and no G418 colonies were obtained. This indicates that the presence of theintron, which in this latter case is in the ‘wrong’ orientation for splicing from the LTRpromoted transcript, effectively blocks neo expression. This indicator gene cassette shouldtherefore provide a stringent direct selection for RNA-mediated transpositions and hence beuseful in the analysis of bona fide or suspected mammalian retrotransposons.Choosing the donor RTVL-H element: The success of an assay for RTVL-H transpositiondepends upon a transpositionally competent donor RTVL-H element in which to insert theRT indicator cassette. All RTVL-H elements that have been sequenced to date containmultiple deletions and stop codons within the coding regions of the element and are thusnot competent for autonomous retrotransposition. However, defective retrovirus constructscan be complemented in trans (Tchenio and Heidmann, 1991) and thus our strategy was toidentify a structure that might still be functional in cis for retrotransposition and tointroduce it into the appropriate cells that can supply the necessary protein functions. Oneproblem with this strategy is that there is no way to identify a priori a transpositionallycompetent element unless there has been identified a recently transposed element throughinsertional mutagenesis. This was the case for Heidmann and Heidmann (1991) indemonstrating TAP retrotransposition as they were able to use the TAP element recently148inserted close to the interleukin-3 gene in the WEHI-3B tumor cell line (Ymer et al., 1985).Given that the RTVL-H family has been relatively stable throughout most of primateevolutionary history (see Chapter III), I suspect that most elements are transpositionallyinactive. Of the several full length cloned elements available, I decided upon the one TypeTa element available for several reasons. First, transient CAT assays have shown strongpromoter activity for Type Ta LTRs in a wide variety of cell lines (Feuchter, 1991; Feuchterand Mager, 1990). Also, endogenous expression of this subfamily is detected in manydifferent cell types (Wilkinson, 1993). Transcription of the marked element is the first stepneeded for transposition. Second, the Type Ta subfamily appears to represent a morerecently expanded RTVL-H subfamily (see Chapter III). It is therefore more likely that aType Ta element is still transpositionally competent. The 3’ LTR of the cH-4 element hadbeen previously tested for promoter activity and had been found to be active in severaldifferent cell lines tested (A. Feuchter, personal communication). Sequence comparisons ofthe 5’ and 3’ LTRs showed them to be 98.4% identical. However, several base differenceswere of note. (1) There is a putative Spi site present in the 3’ cH-4 LTR and in anotherstrong Type Ta LTR promoter (H6, Feuchter and Mager, 1990) and not present in the otherLTR subtypes. This site has been hypothesized but not proven to contribute to the strongpromoter activity of the Type Ta LTRs. This site is mutated in the 5’ cH-4 LTR (3’CCCCCGCCCC ---> 5’ CCCCTGCCCC). Because of this difference, the 5’ cH-4 LTR waschecked for promoter activity in transient CAT assays in GP+E86 cells and in NTera2Dlcells. In NTera2Dl cells, the 5’ and 3’ LTRs had comparable promoter activities (data notshown). In GP+E86 cells, the 5 LTR had weaker activity than the 3’ LTR, but I consideredthe activity sufficient for initial experiments. (2) A second, possibly critical, base differencewas the 5’ most base of the 3’ LTR: 3’ LTR GTCA... vs 5’ LTR TGTCA. Upon reversetranscription of the RTVL-H transcript, the U3 region of the 3’ LTR is used as the templateto reform the intact 5’ LTR. Thus, the unintegrated DNA intermediate of transpositionwould have this C at the 5’ end. All known retroviral LTRs are bounded by short invertedrepeats which begin with ‘TG’ and end with a ‘CA’ (Luciw and Leung, 1992). Deviations149from this have been shown to drastically reduce integration efficiency of the doublestranded DNA intermediate (Luciw and Leung, 1992 and references therein). Since the 3’LTR of cH-4 is the only one of >20 RTVL-H LTRs sequenced to begin with “C” instead of“T”, I felt it important to change this base back to a T. This was done using therecombinant PCR strategy outlined in Fig. 2-2 in Chapter II. It should be noted that both 5’and 3’ cH-4 LTRs end with TGAAA, not . . .CA. However, the observation that almost allsequenced RTVL-H LTRs also end in .. .AA suggests that they had expanded after thismutation occurred. Hence, this was not an immediate concern.Finally, I had to consider how much of the internal RTVL-H sequence to retain inthe vector. The element is noncoding; therefore much of the sequence probably does notneed to be present (we will be relying on the required proteins being supplied in trans).What would be required, based upon RTVL-H transposing via a process similar to that ofretroviruses, would be intact LTRs, the tRNA primer binding site (PBS) for DNA firststrand (minus strand) synthesis, the purine rich 3’ PPT for second strand (plus strand)synthesis, and the packaging signal (5’ internal sequence). The tRNA PBS of RTVL-Helements is homologous to the 3’ end of histidine tRNA (Mager and Henthorn, 1984). Unlikeother RTVL-H elements that have one or two mismatches in the PBS relative to thehistidine tRNA (RTVL-H1, 17/18 matches, Mager and Henthorn, 1984; RTVL-H2, 16/18matches, Mager and Freeman, 1987), the PBS of cH-4 is a perfect match. cH-4 also has theexpected polypurine tract immediately 5’ to the 3’ LTR. The packaging signals ofretroviruses have been identified through deletion analyses. RTVL-H elements are mosthomologous to type C viruses such as MLV in which the packaging signal has been locatedto a 350 bp region just downstream of the SD site. However, the rate of packaging isincreased by addition of some gag sequences. In my RTVL-H RT vector, —2 kb of 5’ internalsequence was included, which should contain the cis-acting packaging signals. However,this is by no means certain as the specific sequence requirements are not yet known. Thereis an internal SD site 140 bp downstream from the 5’ LTR of the RTVL-H RT vector. Thismay be a problem in that splicing of the RTVL-Hlneo-int transcript may occur from the150RTVL-H SD site to the neo-int SA as opposed to the desired neo-int SD-SA processing event(ie competing events). However, for initial experiments, the RTVL-H SD site was leftunchanged. If it does prove to be a problem, the site will have to be mutated.Introduction of the RTVL-H/neo-int vector into a human cell line: The RTVL-Hlneomt construct was first introduced into the human teratocarcinoma cell line, Tera-i. Tera-1was thought to be a good cell line in which to observe retrotransposition for several reasons.This cell line expresses high levels of RTVL-H transcripts including transcripts that containan undeleted pol region (Wilkinson et al., 1993). This may indicate the presence of at leastsome functional RTVL-H elements that could supply the required protein functions, such asreverse transcriptase and integrase, in trans. In addition, Li elements, some of whichencode functional reverse transcriptase, are expressed in teratocarcinoma cell lines(Hutchison et al., 1989; Dombroski et al., 1991). Furthermore, retrovirus-like particles(VLPs) have been observed by electron microscopy (EM) in these cells (Lower et al., i987)and particle association may be a requirement for efficient reverse transcription. Finally,teratocarcinoma cell lines are thought to represent an early embryonic stage (Andrews,1984) and so may be similar in some respects to germline cells in which the expansion ofRTVL-H elements must have taken place during evolution.The RTVL-Hlneo-int vector was cotransfected along with a plasmid conferringhygromycin resistance into Tera-1 cells. Sixteen hygromycin resistant Tera-1 colonies wereanalyzed for the presence of unrearranged RTVL-Hlneo-int constructs. PCRs using primersflanking the neo intron (see Fig. 4-7a) were performed on DNA isolated from each of theseclones. Only 2 of the 16 clones gave the expected 1.2 kb amplified band visible on anethidium bromide stained gel. Interestingly, when a Southern blot of this gel washybridized to a neo probe, the 1.2 kb fragment could be detected in almost all of thesamples. This suggested that a small percentage of cells within each clonal population doescontain the RTVL-H/neo-int construct. One possible explanation for this finding is that theconstruct is integrating but it is not stable and is undergoing LTR-LTR recombination.However, PCRs on selected clones done with primers derived from the plasmid vector151sequences flanking the RTVL-Hlneo-int element did not yield the expected size fragmentfor a solitary LTR. A second possible explanation is that introduction of this constructresulted in a detrimental level of new transpositions. Thus cells not having the constructwould be selected for. This has been observed for experimentally induced transpositions of amarked Ty element in yeast (Boeke et al., 1985). However, there is no reason to believethat a newly integrated marked RTVL-H element would ‘transpose” at a higher rate thanendogenous copies. Of the two clones that had the expected PCR product, further analysisby both PCR and Southern blotting revealed that neither clone contained an unrearrangedcopy of the RTVL-H construct. Both clones contained only one or two copies that appearedto be 5’ truncated (data not shown). A third possible explanation may be a problem intransfecting these particular cells. However, an integrated hygromycin resistance gene waseasily detected in all clones by PCR (data not shown).Introduction of the RTVL-Hlneo-int vector into a mouse packaging cell line: Myinability to obtain Tera-1 clones having an integrated RTVL-Hlneo-int cassette waspuzzling. Was the problem related to these cells already having 1000 endogenous RTVL-Hcopies andlor expressing high levels of endogenous RTVL-H elements? I decided to attemptto introduce the RTVL-Hlneo-int vector into the mouse packaging cell line GP÷E86. Thesecells are routinely used in retroviral gene transfer experiments and are designed to supplythe required protein functions in trans. It is not known whether the murine retroviralproteins will complement the human RTVL-H RT vector. However, the fact that RTVL-Helements are most homologous to type C viruses such as MLV (Mager and Freeman, 1987)suggests that MLV protein products may be able to recognize RTVL-H RNAS. I have alsofound that the 5’ cH-4 LTR has some promoter activity in GP+E86 cells (data not shown).The RTVL-Hlneo-int construct was cotransfected into these cells along with aplasmid conferring hygromycin resistance. Twenty three isolated but not independenthygromycin resistant colonies were picked for further analysis. DNA was isolated for bothPCR and Southern blot analysis. PCR analysis using primers that flank the neo intronshowed that 16 of the 23 clones had the expected size amplified fragment of 1.2 kb for an152unspliced integrated construct. There was no evidence of a product derived from a splicedconstruct on the ethidium bromide stained gel or upon Southern analysis. All clones wereanalyzed by genomic Southern blotting to determine the presence of unrearrangedintegrated constructs and to estimate relative copy numbers of integrated constructs. DNAswere digested with KpnI and Hindill, restriction enzymes that cut within the constructitself to yield a conserved 4.1 kb band upon hybridization. All 16 of the clones positive byPCR had the expected 4.1 kb band indicating the presence of at least one unrearrangedintegrated copy of the RTVL-H/neo-int construct. These clones will subsequently be referredto as RT clones or RT cell lines. Comparing the intensity of this band with other bands inthat lane which most probably represent various rearranged single copy elements indicatesthat these RT clones contain from 1 to 20 copies of unrearranged constructs per clone. Thenumber of rearranged, integrated elements range from 0 to 6 copies. A representativeSouthern blot of several RT clones is shown in Fig. 4-8.Since the initial hygromycin-resistant GP+E86 cells needed to be replated at low celldensity to obtain isolated colonies, the RT clones are not necessarily independent.Examination of the banding patterns obtained upon Southern analysis indicated that someRT clones were probably derived from the same transfected parent cell. There appear to beat least 9 different RT cell lines represented by the 16 RT clones. For clones showing onlythe conserved unrearranged band it is impossible to determine, unless there are differencesin band intensity, whether these clones represent the same or independent cell lines.RNA was also isolated from each of the RT clones for analysis by RT-PCR and byNorthern blotting. RT-PCR results were inconclusive due to DNA contamination of RNAsamples (data not shown). Northern analysis indicated that none of the clones expresseddetectable levels of either the expected -6.5 kb unspliced RTVL-Hfneo-int transcript or the—5.6 kb spliced transcript (Fig. 4-9). Only one clone, clone 10, showed any hybridizationafter a one week exposure, an —2.7 kb band and an even weaker —3.2 kb band.Interestingly, the transcript size expected for a splicing event involving the RTVL-H SDsite and the neo-int SA site is —3,2 kb.153123 4567891020.0-9.4-5.8-4.4-2.3-1 .6-Figure 4-8. Representative Southern analysis ol CP+l86 Wi1 cell lines. DNAs isolated from10 RT clones were (hgested with Kpflh/Hifl(11 II an(l probed with the 5’ RTVL i—Iinternal probe. 9’lw conserved 4.1 kb band is inchcated.1541 2 3 4 56 78 910111213149.5-7.5-4.4-2.4-1.4-0.24-ACTIN-Figure 4-9. Northern analysis of the GP+E86 RT cell lines. Total RNAs from the 14GP+E86 RT clones were probed with an LTR probe tone week exposure). Thesame filter was rehybrkliZed to an actin probe (30 hr exposure).155DISCUSSIONIn this chapter, work was initiated to examine RTVL-H retrotransposition. Thequestions addressed were (1) did the RTVL-H family arise via retrotransposition?, and (2) isthis an on-going process?Did the RTVL-H family arise via retrotransposition?The approach developed was a PCR strategy to identify and isolate previouslyretrotransposed elements and took advantage of a natural splicing event during RTVL-Htranscript processing. Splicing of retroviral transcripts is expected, giving rise to thesubgenomic enu transcript. The RTVL-H splice is unusual in that although it starts at a SDsite in the expected location just downstream of the 5’ LTR, the splice goes into one of acluster of SA sites just 3’ to the region ofgag homology. Whether this spliced transcriptserves a biological function for the element is unknown. It is possible that these transcriptsare simply a consequence of the fact that most elements that are transcribed are thedeleted elements that most likely have lost the SA site associated with the deleted envdomain. What may be occurring is the subsequent activation of a cluster of cryptic SA sites.Such activations are seen with true retroviruses that have had the usual env SA sitedeleted (Armentano et al., 1987; Tchenio et al., 1993). The copia elements of Drosophilahave a naturally occurring spliced transcript, with the splicing event occurring after thegag region to delete out the pol domain (copia has no env domain). This transcript iscapable of encoding the proteins needed to form the cytoplasmic particles (Yoshioka et al.,1990).The presence of a spliced integrated element was identified in the genome oforangutan. This element spliced from the SD site into a previously identified SA site(Wilkinson et al., 1990) and upon further analysis was shown to have an intact 5’ LTR. Onecannot entirely exclude the possibility that this element is a deleted form with deletionbreakpoints that coincide with the SD and SA sites. However, this seems highly unlikely.Further evidence that this element indeed represents a spliced retrotransposed elementwould require the isolation and characterization of the genomic locus. However, even this156would not rule out the possibility of the precise deletion occurring after integration byanother mechanism.One reason for the apparently very rare occurrence of spliced genomic elements isthat the splicing event would be expected to result in the loss of the packaging signal. Inretroviruses, the packaging signal serves to direct the genomic retroviral RNA to theparticle for packaging. The subgenomic spliced env transcript loses this packaging signalbecause this signal is located 3’ to the SD site, at least in murine retroviruses (reviewed inLinial and Miller, 1990). If the putative packaging signal in RTVL-H is in a positionanalogous to that in murine retroviruses, then any spliced transcript would lack thepackaging signal. However, in other systems, deletion of the packaging signal does nottotally prevent packaging. Although the rate of packaging and transfer of a ‘P retroviralvector was reduced 3000-fold relative to a vector containing the packaging signal, it wasstill detectable (Mann and Baltimore, 1985). Therefore, spliced RTVL-H transcripts couldoccasionally be packaged. Packaging would then allow for efficient reverse transcriptionand reintegration. It is also possible that the other elements identified in the PCR strategythat were dismissed as representing deleted elements may have been aberrantly splicedtranscripts that retained the packaging signal and went through the retrotranspositionprocess.The apparently polymorphic PCR fragment in human may represent a newintegration event that is not yet fixed in the population. This integration could have comeabout by two possible mechanisms. One possibility is that it represents a trueretrotransposition of a spliced transcript. However, the inability to demonstrate an intact 5’LTR suggests that this is not the case. The lack of a U3 region instead suggests that thisintegration may represent a processed pseudogene. Processed pseudogenes derived from adefective MLV-based RT vector have recently been reported (Tchenio et al., 1993).However, it is also possible that the U3 region of this particular element was deleted afterintegration or is sufficiently diverged from that of other sequenced LTRs from which the U3primers were designed.157Is RTVL-H retrotransposition an on-going process?The second question, whether RTVL-H transposition is an ongoing process, wasaddressed using two different strategies. Both were designed to detect the occurrence of anew retrotransposition event within the time frame of an experiment. The first involvedlooking for the disruption of a target gene, in this case the HPRT gene, and examining viaSouthern analysis the reason for the loss of activity. It was through such occurrences thattransposable elements were initially identifed (reviewed in Berg and Howe, 1989). A newinsertion would be identified through a change in the banding pattern at the HPRT locus.In total, 54 spontaneous HPRT- mutants were examined and none demonstrated a bandingdifference. Backselection of several clones in HAT indicated that these clones were not infact true HPRT- mutants. Very low HPRT activity (2%) is sufficient to allow survival inHAT (Gillin et al., 1972). Thus there is a question of how stringent a selection in 6-TGshould be. The insertion of an RTVL-H element may not lead to complete gene inactivation.Therefore, selecting for complete gene knock-out may select for large genomic deletions.However, using a more relaxed selection will allow cells with point mutations and smalldeletions to survive. The overriding concern of this strategy was determining how manymutants would have to be examined because of the many different mechanisms by whichHPRT- mutants can be generated, ie. this selection is not specific to RTVL-H insertion. In astudy looking at insertional mutagenesis of the HPRT gene in murine embryonal carcinoma(EC) cells, King et al. (1985) also found that in spontaneous HPRT- EC mutants, smalllesions in the HPRT gene rather than gross alterations were the responsible mutationalevents. Furthermore, they found that infection of EC cells with MLV led to a several-foldincrease in the frequency of mutation at the HPRT locus and that a significant number ofthese virus-infected HPRT- mutant lines showed gross alterations in the HPRT locus uponSouthern analysis. Thus, for this type of strategy to have any chance of success in detectingRTVL-H transpositions, it would probably be necessary to induce transposition in someway. For example, for the detection of Ty transpositions in an experimental time frame, themarked Ty element was linked to an inducible GALl promoter. Growth of the transfected158yeast on galactose induced high levels of expression of the marked element, which in turnled to high rates of transposition, not just of the marked element but also of endogenous Tyelements (Boeke et al., 1985; Curcio and Garfinkel, 1991). There are two final concernsabout using the HPRT gene as a target gene for insertional mutagenesis. One concern wassuggested by the work of King et al. (1985) who found that integration of MLV into theHPRT gene in F9 EC mouse cells occurred about 100-fold less frequently that would beexpected if proviral insertion into cellular DNA was entirely random. This suggests that theHPRT gene may not be a favourable target for insertional mutagenesis. The second concernwas that resistance to guanine analogues such as 6-TO has been reported to occur by avariety of cellular mechanisms, only one of which is mutant HPRT (Murray, 1971). Thus,this strategy was abandoned in favor of the second strategy that allowed for the directselection of an RTVL-H retrotransposition event.The second strategy, involving the use of a retrotransposition indicator gene, is themost direct way of demonstrating retrotransposition but it does have some drawbacks. Thefirst is that it is not feasible to test more than a few marked elements and there is no wayto determine beforehand which elements may be transpositionally competent. Many stepsmust occur successfully during the process of retrotransposition: (1) There must betranscription of the marked element. LTRs can be tested for promoter/enhancer activity intransient CAT assays but even then there is no guarantee for stable, long term expressionof an integrated element. (2) There must be efficient splicing of the transcript. Preliminaryexperiments with the neo-int cassette in a standard retroviral vector indicated that thecassette was spliced efficiently but there is the concern of the potentially competing RTVLH SD site. (3) There must be reverse transcription of the marked transcript which mayrequire the association with a VLP. Since the marked element does not encode thenecessary protein functions itself, it is relying on the host cell to provide these functions intrans, possibly through the presence of a few functional RTVL-Hp elements or otherendogenous elements such as Lls. The cell line Tera-1 was used as it is the only cell lineidentified as expressing a 6.5 kb RTVL-H transcript, the expected size for an RTVL-Hp159transcript containing a intact poi region (Wilkinson et al., 1993). Therefore, this cell line isthe best candidate for expressing RTVL-H protein products. However, teratocarcinoma cellsalso express high levels of Li transcripts (Skowronski et al., 1988) and functional Lielements have been identified (Dombroski et al., 1991, 1993; Mathias et al., 1991). Thesemay be the source of the reverse transcriptase activity identified in these cells (Deragon etaL, 1990). It is not known whether Li proteins would recognize RTVL-H RNAs. Finally,VLPs have been observed by EM in teratocarcinoma cells (reviewed in Wilkinson et al., inpress) but it is not known whether RTVL-H RNAs can be packaged in these VLPs. Thesesame concerns regarding protein complementation between different elements also applywith the RT experiments using the GP+E86 cell line. It is not known whether an RTVL-Helement can be complemented by MLV-derived protein products. The fact that RTVL-Helements are most homologous to type C viruses such as MLV (Mager and Freeman, 1987)may increase the likelihood of MLV products recognizing RTVL-H RNAS. In support of thispossibility is the report that transcripts containing a MLV W signal are efficiently packagedby REV, an avian retrovirus that shows some homology with MLV gag and poi genes(Luciw and Leung, 1992). (4) The marked element must possess all the required cissequences for retrotransposition. The cH-4 element possesses a putative minus-strand PBSthat is a perfect match (18/18) to the 3’ end of the histidine tRNA. There is also a putativepurine rich 3’ plus-strand PPT. The 5’ internal region that in retroviruses contains thepackaging signal has been included within the RTVL-H RT vector. However, the specificsequence elements, if any, that are responsible for packaging have not been identified forretroviruses. Therefore, it is a matter of speculation as to whether the cH-4 elementpossesses a packaging signal. Finally, for retroviruses, the inverted repeats at the outeredges of the LTRs are required in cis for integration. Retroviral LTRs are all bounded by 5’TG ... CA 3’. RTVL-H LTRs are bounded by short, often imperfect, inverted repeats. Themajority of RTVL-H LTRs sequenced are bounded by 5’ TG ... AA 3’, which suggests thatthese elements amplified after this mutation occurred. The 3’ LTR of cH-4 was unusual inthat it is bounded by 5’ CG ... AA 3’. Since this LTR is the only one of>20 LTRs that have160been sequenced that begins with a C suggests that this is a recent mutation in this LTRand may perhaps be deleterious to further retrotransposition of this element due todeleterious effects this base change is expected to have on integration. This base waschanged back to a T’ using a recombinant PCR strategy. (5) Once integrated, there must beexpression of the restored neo” gene from the TK promoter. A potential problem here mightbe the site of integration, i.e. it may not be permissive for transcription. I have howeverdemonstrated that the restored neoT gene encodes a functional protein in preliminaryexperiments with the JZEN/neo-int construct.Another drawback of this general approach is that it is quite possible that theprobability of any one RTVL-H element undergoing a retrotransposition event is too smallto be measurable in an experimental time frame. Transposition frequencies will vary fordifferent element families. Tchenio and Heidmann (1991) used a similar strategy todemonstrate transposition of defective retroviruses and found a transposition frequency ofi06. Using a marked TAP element, Heidmann and Heidmann (1991) found a frequency oftransposition almost ten fold lower. We expect the transposition frequency of RTVL-H to bevery low, based upon the relative stability of the family over much of primate evolution (seeChapter III). I did choose an element that has shown more recent evolutionary expansionand thus may transpose at a higher frequency, but even Type Ia elements seem relativelystable over the last 15 MYr. Heidmann and Heidmann (1991) had an enormous advantagein that they had available a very recently inserted TAP element (Ymer et al., 1985) as thedonor element for their RT vector. It will probably be important to get multiple copies of themarked element integrated unrearranged to reduce the number of cells that would have tobe selected to detect a transposition event. Heidmann and Heidmann (1991) determined afrequency of 1.5 x i06 events giving rise to G418-resistant clones per cell per generation.However, the clone analyzed had 20-50 integrated copies of the marked TAP element. If allwere transcriptionally active, the transposition frequency of one particular element shouldbe 20-50x less. This assumption was in agreement with the absence of G418R variants intwo mycophenolic acid-resistant clones that had integrated only a few copies when 106 cells161were selected in G418. In those two cases, selection of >108 cells would have beennecessary to detect a transposition event. Also, it will probably be important to get highlevels of expression as work with Ty in yeast suggest that rates of transposition arecorrelated with rates of transcription (Boeke et aL, 1985; Curcio and Garfinkel, 1991).One question that remains unanswered is why I was unable to obtain Tera-1 clonescontaining the RTVL-Hlneo-int construct when the same construct could be introduced intothe mouse packaging cell line at relatively high frequency. There are several possiblereasons. The problem may simply reflect the ability to transfect constructs into differentcell types. Tera-1 is a teratocarcinoma cell line and GP+E86 is a fibroblast cell line.Transfection experiments may not be directly comparable between cell types. For example,transfection efficiencies will vary between cell types. Nonetheless, it is interesting thathygromycin-resistant clones were obtained for both cell lines. Also, PCR analysis of theTera-1 clones indicated that an integrated hygromycin resistance gene was present in allcases. However, while a high percentage of GP+E86 hygromycin’ clones (16/23) also hadone or more copies of the RTVL-Hlneo-int construct, only 2/16 Tera-1 clones had RTVLH/neo-int sequences present. Possibly the high expression of endogenous RTVL-H elementsis interfering with integration in some way. However, this was not seen in similarexperiments performed by Heidmann and Heidmann (1991) using a marked lAP elementthat was jntroduced into a cell line that expressed high levels of endogenous lAPtranscripts. Another possibility is that the high copy number is interfering in some way; i.e.promoting some type of recombination or gene conversion processes leading to the loss ofthe neo-int cassette. However, again this was not seen in the above mentioned TAPexperiments (Heidmann and Heidmann, 1991) in which the copy number of endogenousTAPs in mouse cells is similar to RTVL-H copy number in humans. I did examine thepossibility that the element was unstable and possibly was undergoing LTR-LTRrecombination. This was suggested by the observation that upon hybridization of severalPCRs of Tera- 1 hygromycin’ clones, these clones were positive for neo’ sequences,indicating that a small percentage of cells within those populations contained the neo-int162construct. However, a PCR experiment designed to detect the presence of solitary LTRsderived from the RTVL-H RT vector did not yield the expected amplified product from theseveral clones examined, suggesting that LTR-LTR recombination was not occurring. Thereis another possibility. It has been observed during similar experiments in yeast thatinducing transcription of a marked Ty element to high levels resulted in such a high rate oftransposition that it proved lethal to cells (Boeke et al., 1985). However, with my RT vector,RTVL-H transcription is being driven by its natural promoter and therefore expression andhence transposition would not be expected to occur at a rate any different from that ofendogenous elements. It is possible that the introduced vector is not ??controlledll by the cellin the same way as endogenous copies are. For example, endogenous elements may belocated in transcriptionally inactive regions of the genome andlor may be heavilymethylated. In contrast, the introduced vector may integrate into actively transcribedregions. However, endogenous elements are expressed, especially in teratocarcinoma celllines. This may represent low levels of expression from many elements however. Theintroduced vector may itself be expressed at high levels. At present, there is no satisfactoryexplanation for my inability to get the RTVL-Hlneo-int construct into Tera-1 cells.163CHAPTER VAN RTVL-H LTR PROVIDES A POLYADENYLATIONSIGNAL TO A NOVEL ALTERNATIVELY SPLICEDTRANSCRIPT IN NORMAL PLACENTAThe data presented in this chapter have been incorporated into the following manuscript:Goodchild, N., Wilkinson, D. and Mager, D. 1992. A human endogenous long terminalrepeat provides a polyadenylation signal to a novel alternatively spliced transcript innormal placenta. Gene 121: 287-294.I 6L1INTRODUCTIONThe work presented in this chapter describes one possible impact that RTVL-Helements can have on the host genome. It is now widely accepted that retroelepe ts canplay a significant role in the genetics of an organism. Long recognized as a major source ofmutagenic activity in lower eukaryotes (for review see Berg and Howe, 1989), reports arerapidly accumulating for retroelement-associated mutations resulting in disease in man.These mutations may represent element-mediated rearrangements (Hobbs et al., 1986;Lehrman et al., 1987; Rouyer et al., 1987; Markert et al., 1988) or de novo insertions(Kazazian et aL, 1988; Morse et al., 1988; Muratani et al., 1991; Wallace et at., 1991; Mikiet al., 1992; Goldberg et al., 1993; Narita et al., 1993; Vidaud et al., 1993). In addition toinsertional inactivation, the presence of a retroelement can affect the expression of anadjacent gene without destroying its function. Retroelements contain transcriptionalregulatory sequences that can potentially promote and/or enhance the expression of anearby gene (Ymer et al., 1985; Canaani et al., 1983; Banville and Boie, 1989; Chang-Yehet al., 1991), or can be used to polyadenylate an unrelated transcript (Baumruker et al.,1988; Kress et al., 1984; Harendza and Johnson, 1990; Paulson et al., 1987).RTVL-H LTRs contain regulatory sequences that direct the expression of theassociated RTVL-H element (Wilkinson et al., 1990), but can also promote, enhance andpolyadenylate heterologous transcripts (Feuchter and Mager, 1990; Mager, 1989). In recentstudies, our laboratory (Feuchter et aL, 1992; Feuchter-Murthy et al., 1993) and others (Liuand Abraham, 1991) have identified, unrelated cellular transcripts that have beenpromoted by an RTVL-H LTR. Here I report that an RTVL-H LTR contributes apolyadenylation signal to a novel transcript found in normal human placenta. I have beenaided in this work by Dave Wilkinson who performed the initial characterization of theplacental cDNA clones analyzed here and who provided the RNA samples used in theNorthern analysis of PLT-related sequences (Fig. 5-5), and by Betty Yip who assisted in theisolation and characterization of some of the cDNA clones shown in Fig. 5-8.165RESULTS AND DISCUSSIONIdentification ofnon-RTVL-H cellular transcripts polyadenylated within RTVL-HLTRs: The strategy used for isolating cellular transcripts polyadenylated within anRTVL-H LTR is shown diagrammatically in Fig. 5-1. It is based on the differentialhybridization patterns obtained when using probes specific for distinct regions of theRTVL-H element. A prototypical LTR consists of three subregions: U3, containing thetranscriptional promoter signals; R, defined at the 5’ end by the transcriptional start siteand at the 3’ end by the site of poly(A) addition, and containing the poly(A) signal; and theU5 region (Temin, 1982). As shown in Fig. 5-lA, autonomous RTVL-H transcripts wouldhybridize to probes specific for U3, U5 and internal regions. Such transcripts are abundantin normal placental amnion and chorion and in some transformed cell lines, particularlyteratocarcinoma cells (Wilkinson et al., 1990). In a search for heterologous genes that mayutilize an RTVL-H LTR as a promoter, our laboratory has recently isolated andcharacterized several candidate eDNA clones that hybridize to a U5 probe but not to anyother RTVL-H derived probe (shown schematically in Fig. 5-1B) (Feuchter et aL, 1992). Wehave also been interested in identifjing cellular genes that utilize an RTVL-H LTR as apolyadenylation signal. cDNA clones derived from the transcripts of such genes wouldhybridize only to a U3 specific probe as shown in Fig. 5-iC.In a previous study, our laboratory isolated two cDNA clones from different sourcescontaining an RTVL-H LTR that had contributed a polyadenylation signal to thecorresponding transcript (Mager, 1989). However, those cDNAs contained no ORF andexpression of the transcripts in normal cells was not detected. Eight additional cloneshaving the hybridization pattern indicated in Fig. 5-iC were also isolated from a normalhuman placental cDNA library (Mager, 1989). In this study I have sequenced the termini ofeach of those eight clones and found that three of them were indeed derived fromtranscripts that had been polyadenylated within an RTVL-H LTR. Two clones wereidentical and will be referred to as cPj-LTR whereas the third clone, cPp-LTR, wasunrelated. Both clones are shown schematically in Fig. 5-2. Further sequence analysis of166U3 R 1U51 U3 R 1U51’AAABAAACAAAFigure 5-1. Strategy for isolating non-RTVL-H cellular sequences being polyadenviatedwithin RTVL-H LTRs. The strategy is described in detail in the text.(A): Diagram of a genomic RTVL-H element and ts unit length transcript.(B): Diagram of an LTR serving as a promoter br unrelated 3 sequences.(C): Diagram of an LTR providing a polvadenvlation signal for an unrelatedtranscript. Genomic DNA is represented by thick lines (solid lines areRTVL-H sequences and broken lines are unrelated cellular sequences(. LTRsare shown by the large open boxes. U3. R and U5 are the three functionaldomains of the LTR. Black boxes labelled a. b and c represent probes specificfor the U3, US and internal sequences respectively of the RTVL-H element.Transcripts, represented by thin lines, and the probes that they wouldhybridize to are shown below the genomic sequences.167PROBE2(E)SStHIICONTAINSHIGHLYREPETITIVEDNAPROBE3AA100bpccFigure5-2.SchematicrepresentationoftwounrelatedeDNAclonescorrectlypolyadenylatedwithinanRTVL-HLTR.(A):cPj-LTR, 1.25kb.(B):cPp-LTR,1.95kb.TheblackboxesindicateLTR-relatedsequencesandtheopenboxshowsanORFincPj-LTR. ThesmallnumberedarrowsaboveandbelowthecPj-LTRcloneindicateprimersusedlater inPCRanalysis.PrimersequencesaregiveninTable2-1.PositionsofDNAfragmentsusedasprobesareindicatedwithlinesbeneatheachclone.Restrictionenzymes: E, EcoRI; S,Styl;Sm,Smal;St,Stul.The5tEcoRIsites(E)createdbycloningareenclosedinparentheses.()SmSSti-uiI-.i’2LTRA B4-34—4PROBE1SPROBEcPp-LTR revealed no ORF and hybridization analysis indicated that this clone contains ahighly repetitive sequence (data not shown), In addition, Northern analysis using the probeindicated in Fig. 5-2B (a 390-hp Styl-Stul fragment) did not detect related transcripts inplacenta. For these reasons, cPp-LTR was not pursued further.Sequence analysis of cPj-LTR: Initial partial sequencing of cPj-LTR revealed an ORF, sothis clone was sequenced on both strands in its entirety. The nt sequence of cPj-LTR andthe aa translation of the ORF is shown in Fig. 5-3. The ORF extends from the extreme 5’end of the clone for 729 bp, ending in a TGA stop codon, and codes for a putative protein of243 aa. The first ATG occurs at nt 64 and is in a favourable context for translationinitiation (Kozak, 1987). Use of this ATG would result in a protein of 223 aa. However,since the ORF begins at the 5’ terminus, it is likely that the cPj-LTR cDNA is 5’ truncated.The LTR sequence begins at nt 756 which is 24 bp downstream of the termination codon.The LTR polyadenylation signal 5’-ATTAAAAA is located at nt 1214-1221 andpolyadenylation has occurred following a C at nt position 1233. The sequence of the LTRclassifies it as Type II, an RTVL-H LTR subtype representing 30-40% of all RTVL-H LTRsand which appears to be involved more frequently in polyadenylating heterologoustranscripts than the more abundant Type I LTRS (Mager, 1989). The reason for this findingis unknown since both types of LTRs can function efficiently as a polyadenylation signal ofa reporter gene (Mager, 1989).The translated sequence of the cPj-LTR ORF was analyzed using the GeneticsComputer Group (Madison, WI) software package (Devereux et al., 1984). The sequencecontains no internal repeats and predictions regarding secondary structure andhydrophobicity do not indicate the presence of any recognized protein domain. Databasesearches identified no obvious significant homologies with any known DNA or proteinsequence or any recognized protein motif. However, these searches did reveal that aGWWW segment, present in cPj-LTR at aa positions 119-122 and bracketed in Fig. 5-3, isfound in the coat proteins of picornaviruses such as polioviruses and rhinoviruses. Thetryptophan triplet is also found in the coat protein of woodchuck hepatitis B virus and twice1691 GlyLysValValLeuGluLeuGluArgPheLeuProGlnpropheThrGlyGlu IleArgGlvMetCysAspPhe1 GGAAAAGTGGTCCTGGAGCTGGAGCGCTTCCTGCCCCAGCCCTTCACCGGCGAGATCCGCGGCATGTGTGACTTC26 NetAsnLeuSerLeuAlaAspCysLeuLeuValAsnLeuAlaTvrGluSerSerValPheCysThrSerlleVal76 ATGAACCTCAGCCTGGCGGACTGCCTTCTGGTCAACCTGGCCTACGAGTCCTCCGTGTTCTGCACCAGTATTGTG51 AlaGlnAspSerArgGlyHis I leTyrHisGlyArgAsnLeuAspTyrAlaPheGlyAsnValLeuArgLysLeu151 GCTCAAGACTCCAGAGGCCACATTTACCATGGTCGGTTTGGATTATGCTTTTGGGAATGTCTTACGCAAGCTG76 ThrValAspValGlnPheLeuLysAsnGlyGlnlleAlaPheThrGlvThrThrPhelleGlyTyrValGlyLeu226 ACAGTGGATGTGCAATTCTTAAAGAATGGGCAGATTGCATTCACAGGAACTACTTTTATTGGCTATGTAGGATTA101 TrpThrG1yG1nSerProHisLysPheThrVa1SerG1yAspGiuArgAspLy1yTrpTrpTr31uAsnA1a301 TGGACTGGCCAGAGCCCACACAAGTTTACAGTTTCTGGTGATGAACGAGATAAPGCTGGTGGTGJ3AGAATGCT126 IleAlaAlaLeuPheArgArgHislleProValSerTrpLeu: IeArgAlaThrLeuSerGluSerGluAsr1Phe376 ATCGCTGCCCTGTTTCGGAGACACATTCCCGTCAGCTGC-CTGATCCGCGCTACCCTGAGTGAGTCGGAAAACTTC151 GluAlaAlaValGlyLysLeuAlaLysSerProLeuIieAlaAsp.’aiTyrTyrlleLeuGlyGlyThrSerPrc451 GAAGCAGCTGTTGGCAAGTTGGCCAAGTCTCCCCTTATTGCTGTGTTTATTACATTCTTGGTGGCACGTCCCCC176 ArgGluGlyValVallleThrArgAsnArgAspGlyproAlaAsc I ieTrpProLeuAspproLeuAsnGlyAla526 CGGGAGGGGGTGGTCATCACGAGGAACAGAGATGGCCCAGCAACATTTGGCCTCTAGATCCTTTGAATGGAGCG201 TrpPheArgVa1G1uThrAsnTyrAspHisTrpLysProA1a rLvsG1uAspAspArgArqThrSerA1a lie601 TGGTTCCGAGTTGAGACAAATTACGACCACTGGAAGCCAGCACAA5GAAGATC-ACCGGAGAACATCTGCCATC226 LysAiaLeuAsnAiaThrGlyGlnAiaAsnLeuSerLeuGiuKiaLeuPheGinEnd676 AAGGCCCTTAATGCTACAGGACAAGCAAACCTCAGCCTGGAGGCACTTTTCCAGTGAGCAAGAAGAACCCATCAG751 GTGATTGTCAGGCCTCTGAGCCCAAGCTAAGCCATCATATGCCT7TGACCTGCACGTATACATCCAGATGGCCT826 GAAGCAACTGAAGATCCACAAAAGAAGAGAATAGCCAGTTTTTAACTGATGACATTTCACCATTGTGA901 TTTGTTCCTGCCCCACCCTAACTGATCAATTGACTTTGTGACAAGACACCCTCCCCACCCTTGCAATAATGTACT976 TTGTGATATTCCCCCGCCCTTGTGAATGTACTTTGTACGATACACCCTCCCCACCCTTGAGAAGGTACTTTGTAA1051 TATCCTCTCCGCCCTTGAGAATGTACTTTGTAAGATCCACTTCOTGCCTGCAAAAAATTGCTCCGAACTCCACCG1126 TCTATCCCACCTATGCTTGATAATCCTACCACCCTTTGCTGACTCCTTTTTTGGACTCAGCCCACGT1201 TGCGACCCAGGTGATTAAAAAGCTTTATTGCTCAAAAAAAAAA 1243Figure 5-3. Sequence analysis of cPj-LTR. The nt sequence of cPj-LTR and the aatranslation of its ORF are shown. The LTR is boxed and the polyadenylationsignal is underlined. Possible N-glycosylation sites are overlined. TheGWWW aa motif mentioned in the text is bracketed. This sequence has beendeposited in GenBank with the accession number M92449.170in the coding region of human cytomegalovirus. A general scan of the 33989 proteinsequences in the National Biomedical Research Foundation Protein Identification Resourcedatabase (release 30) detected only 17 other unrelated proteins containing a WV.TW tripletindicating that the occurrence of this segment is very rare. Further comparisons of theimmediate regions surrounding the WWW motif in cPj-LTR and the various other proteinsrevealed no other shared similarities. Thus, although the presence of WWW in thetranslated sequence of cPj-LTR is interesting, its relevance to the functional role of theputative protein is currently unknown.Because the gene that produced the cPj-LTR transcript is not related to any knowngenes, it has been termed PLT for placental LTR jerminated gene.Genomic organization of the PLT locus: Southern analysis was performed to examinethe genomic organization of PLT related gene(s). Human genomic DNA was digested withseveral different restriction enzymes and hybridized to probe 1, a 760 bp EcoRl-Stulfragment encompassing the entire non-LTR portion of cPj-LTR (see Fig. 5-2A). Severalbands were seen with each digest (data not shown), indicating either that FLT is a memberof a low-copy-number gene family, or that the probe spans multiple exons. To attempt todistinguish between these possibilities, a smaller 120 bp Styl-Stul probe (probe 2, Fig. 5-2A) was used to rehybridize the same filter. Although two hybridizing restriction fragmentswere observed with most digests, Fig. 5-4A shows that a single large BamHI fragment ofapproximately 18 kb hybridizes to this probe. This result suggests that PLT is a singlecopy, multi-exon gene and that the 120 bp probe 2 spans at least two exons.To determine if the PLT locus has been conserved between species, human, monkeyand mouse DNAS were digested and hybridized to probe 1. Even under high stringencywashing conditions, bands were seen in all lanes (Fig 5-4B). The presence of PLT-relatedsequences in the mouse suggests that this gene has been evolutionarily conserved and thushas functional significance.171A B123 4567 1 23kb20.0—kb9.4—9.4—5. 8—5.8— 4.4—4.4—2 .3—2.3— 1.6—1 . 1—1 . 6—1 . 1—Figure 5-4. (A): Southern analysis of human DNA. 5 .tg of human genomic DNAfrom a normal individual was digested with various restriction enzymes andprobed with probe 2 (see Fig. 5-2). Hybridization was done at 65°C and thefinal post-hybridization wash was done at 65°C in 1xSSC, 1% SDS. Lanesare: 1, PvuII; 2, BglII; 3, HindIll; 4, BamHI; 5, EcoRl; 6, KpnI; 7, PstI.(B): Evolutionary conservation of Pj-related sequences. 5 ig of EcoRI-digestedhuman (lane 1), African green monkey (lane 2), and mouse (lane 3) DNAswere probed with probe 1 (see Fig. 5-2A). Hybridization was done at 55°C,and the final wash was at 65°C in 3xSSC, 1% SDS.172Expression ofPLT-related transcripts: Since cPj-LTR was isolated from a placentalcDNA library, Northern analysis was performed to see if similar chimeric transcripts couldbe detected in placental RNA from different individuals. A panel of total RNAs fromplacental tissues and polyA+ RNAS from various human cell lines was hybridized tocPj-LTR probe 1. This probe detected the same set of three transcripts of approximately1.65, 1.9 and 2.4 kb in all placental samples (Fig 5-5A). Rehybridization of this panel ofRNAs to a Type II LTR-specific probe (probe 3 in Fig. 5-2A; also see Fig. 3-5) detected a1.65 kb transcript in the decidua and villus samples and in one chorion sample suggestingthat the cPj-LTR fusion transcript exists in normal placental RNA (Fig 5-5B). However, thetwo larger placental transcripts of 1.9 and 2.4 kb detected by probe 1 did not hybridize tothe LTR probe 3. In addition, the 1.65 kb band did not hybridize to the LTR probe in theremaining chorion samples or in the amnion samples, indicating that this band mayrepresent a heterogeneous population of similar-sized transcripts. Fig. 5-5A shows that oneor both of the 1.9 and 2.4 kb transcripts detected by the non-LTR probe 1 are also presentin various cell lines indicating that expression of this gene is not specific to placenta.However, neither probe detects the 1.65 kb transcript in any of the cell lines suggestingthat expression of the LTR chimeric transcript does not occur in these cell lines and may berestricted to placenta. A faint band of approximately 5 kb was also observed in some of theplacental total RNA samples using probe 1 (Fig. 5-5A). The origin of this transcript is notknown but it may represent nonspecific cross hybridization to the 4.8 kb rRNA species sinceit was not observed in the lanes of polyA+ RNA.It is of interest that although there are at least 300 Type II RTVL-H LTRs in thehuman genome (Mager, 1989), the LTR probe specific for this LTR subpopulation detectsjust two primary bands in Northern analysis of the placental tissues. The larger of thesetwo bands may represent the LTR containing FLT transcript while the smaller diffuseband of approximately 0.9 kb in decidua and villus samples presumably represents adifferent transcript or transcripts containing LTR sequences. This finding indicates thatvery few Type II LTRs are expressed to a detectable level in placenta. It should be noted173c’JCD—amnionchoriondeciduavillusCDamnionchoriondeciduavillusc,c’CDa)0CCD—>0CD—123123123123]cC1Z123123123Zc’jLLZkbkb-9.5—9.5—7.5—•7.5—4.4—4.4—2.4—1.4—0.24—Figure5-5.NorthernanalysisofPLT-relatedsequences.10jagof totalRNAisolatedfromtheamnion,chorion,deciduaandvillusofthreedifferentplacentalsamples,and5jagofpolyA+RNAisolatedfromvarioushumancelllineswereelectrophoresedina1.2%agarosegel,blottedandhybridizedusingconditionsdescribedbyWilkinsonetal.,1990.(A):HybridizationtoPjprobe1(13dayexposure)(B):ThesamefilterwasstrippedandrehybridizedtoTypeIILTRprobe3(seeFig.5-2A;14dayexposure).Celllines:KG-iandK562,leukemia;5637,bladdercarcinoma;293,embryonalkidney;NTera2Dl,teratocarcinoma;HEp2,asublineofHela.AB2.4—S1.4—0.24—that the prominent bands of approximately 5.4kb in HEp2 cells (a subline of Hela) and 1.6and 2 kb in the leukemia cell line 1(562 shown in Fig, 5-5B represent other RTVL-Htranscripts unrelated to FLT that contain LTR sequences of the Type II subpopulation.Interestingly, although RTVL-H sequences in general are highly expressed in NTera2Dlcells (Wilkinson et aL, 1990), those containing Type II LTRs are not (Fig. 5-SB; Wilkinson,1993).Detection ofPLT-LTR RNA and genomic DNA by PCR: The Northern analysisdiscussed above suggests that a PLT-LTR fusion transcript is expressed in placenta: asimilar sized transcript (1.65 kb) was detected by both the PLT-specific probe and the LTRspecific probe. However, it is also possible that the 1.65 kb band represents a heterogenouspopulation of similar sized transcripts with different transcripts hybridizing to each probe.To confirm the existence of a FLT-LTR fusion trancript, PCR analysis was carried out usingdecidua polyA+ RNA. The RNA was first copied into eDNA and then PCR was performedusing primers that span the PLT-LTR junction. After the first round of PCR using primers1 and 4 (see Fig. 5-2A), the expected 540 bp fragment could not be detected on anEtdBr-stained gel although a trace signal was seen when the amplified DNA washybridized to probe 2 (data not shown). When the amplified product from the first round ofFOR was reamplified using nested primers (primers 2 and 3, Fig. 5-2A), the expected 120bp band was seen (Fig. 5-6). These results confirm that PLT-LTR fusion transcripts arenormally expressed in placenta but at a very low level.To determine if the RTVL-H insertion into the FLT locus was a recent event, PCRwas also performed on genomic DNA from different primate species including human,chimpanzee, gorilla, orangutan, gibbon, baboon, African green monkey and marmoset. Inthis case, the 5’ primer used was immediately upstream of the LTR (nt 735 - 754 in Fig. 5-3) and the 3’ primer was primer 4 (Fig. 5-2A). The expected size fragment of -310 bp wasamplified from all DNA samples except marmoset (Fig. 5-7). The amplified band is ofslightly different sizes in the different primate species and probably results from the LTR,which is Type II, having different numbers of the Type II repeat. Such variability is175A1 M2 3B1 2 3Figure 5-6. PCR analysis. 2 jig of decidua polyA+ RNA was reversed transcribed usingrandom hexamers as primers and Moloney-murine leukemia virus reversetranscriptase. One third of the cDNA reaction was used in PCR with primersthat span the cellular-LTR junction (shown schematically in Fig 5-2A). Tworounds of amplification were done and products were analyzed byelectrophoresis in a 2% agarose gel followed by Southern blot analysis usingprobe 2. Round 1 used primers 1 and 4 (not shown). Round 2 was done usinga portion of the reaction mixture from Round 1 and the nested primers 2 and3. (A): EtdBr-stained gel. (B): Southern analysis of gel in A using probe 2.Lane 1, no DNA negative control; 2, cPj-LTR clone as a positive control; 3,decidua cDNA; M, DNA markers. The arrow indicates the expected 125 bpamplification product. Primers used: Primer 1 (nt position 502-521) andprimer 2 (nt position 659-6 78) are specific for the non-LTR portion ofcPj-LTR. Primer 3 (nt position 784-760) and primer 4 (nt position 1045-1026)are LTR specific. Coordinates refer to Fig. 5-3. Primer sequences are listed inTable 2-1.176kb1.6—1.1—o . 5—0.2—1234 567894—Figure 5-7. PCR analysis of the cPj-LTR RTVL-H integration site in primates. Shown isthe EtdBr-stained gel of the PCR products obtained using primers spanningthe cellular/LTR junction of clone cPj-LTR (see Table 2-1 for primersequences). Lane 1, no DNA negative control; 2, human; 3, chimpanzee; 4,gorilla; 5, gibbon; 6, orangutan; 7, baboon; 8, African green monkey; 9,marmoset. The expected —31O bp amplified product is indicated by the arrow.177common in tandemly repeated structures and results from recombination andlor slippageduring DNA synthesis. Thus, these PCR results suggest that this RTVL-H insertionoccurred after the divergence of the Old World lineage from New World monkeys (45 MYrsago) but prior to the, divergence of Old World monkeys and the apes (30 MYrs ago).Isolation and analysis ofadditional PLT-related eDNA clones: The results presentedthus far indicate that a PLT-LTR fusion transcript is expressed in placenta, but that otherPLT-related transcripts not associated with an LTR also exist in placenta as well as inother cell types. One explanation for these findings is that FLT is a multicopy gene, withone copy using an LTR as a polyadenylation signal. Southern analysis, however, indicatesthat FLT is most probably a single copy gene. An alternate explanation is that the FLTlocus undergoes alternate splicing, with polyadenylation within an RTVL-H LTR being oneof several possibilities. To address this question, the placental eDNA library was rescreenedwith probe 1 to isolate additional FLT-related clones. Of the six clones isolated, two wereidentical to the original cPj-LTR eDNA. The other four PLT-related clones, cPj-1 through 4,did not contain LTR-related sequences. Restriction enzyme mapping and partial sequencingof the 5’ ends of the clones indicated that the 5’ portions of the clones are very similar, if notidentical, to cPj-LTR. Clones cPj-3 and cPj-4 both extended further 5’ than cPj-LTR and thelonger of these, cPj-3, adds another 47 aa to the ORF which begins at the extreme 5’ end ofthe available sequence, suggesting that the cDNA clone is still not complete. We cannotdetermine by the size of the cDNA clones if any of them correspond to the larger PLTtranscripts of 1.9 and 2.4 kb because all the clones appear to be 5’ truncated and thelongest of them, cPj-3, is only 1.6 kb.The relationship between cPj-LTR and the four related clones at their 3’ ends isquite complex and is shown schematically in Fig. 5-8. Partial sequencing revealed thathomology to cPj-LTR ceases at the end of the ORF in cPj-LTR giving a cPj-LTR, a cPj-1 anda cPj-213/4 branch. The cPj-1 ORF ends immediately at the same TGA stop codon ascPj-LTR and then the sequence continues into an Alu element. The 3’ end of cPj-1 ispolyadenylated. The cPj-2/3/4 branch splits after an additional 29 bp that maintain the178c Pj-1cPj-LTR—b 1*c Pj-2c Pj-3cPj-4Mu :::::i•LTR-* I Mu IAAAAAAAA100 bpFigure 5-8. 6Diagram of the 3 portions of PLT-related eDNA clones. Approximately 10plaques of the placental cDNA library were screened with probe I usingstandard procedures (Sambrook et al., 1989). Positive plaques were purified,the appropriate DNA fragments subcloned into plasmid vectors and theirtermini sequenced. Sequence information was also obtained from the regionof the cellular-LTR junction using primer 2 (see Fig 5-2A). The relationshipbetween cPj-LTR and the four related clones at their 3’ ends is shown, withthick lines representing cDNA sequences (broken lines indicating regions notsequenced) and thin lines indicating the branchpoints where identitybetween the various clones ends. Black box is the LTR sequence and openboxes are Alu sequences. rphe end of the ORF in each clone is indicated by astar. Small arrow represents primer 2.** Iu...u.IIIIlIIII179ORF into cPj-2, cPj-3 and cPj-4. The cPj-2 ORF extends a further 10 bp before ending in aTAA stop codon, adding 13 aa to the ORF and this cPj-2 clone terminates within an Alusequence. The cPj-3 ORF ends immediately after the branch point in a TGA stop codon. ThecPj-4 ORF continues for another 79 bp after the branch point, extending the ORF by a totalof 36 aa. Interestingly, cPj-3 and cPj-4 share a common 3’ end that is different from that ofcPj-1. Further database searches of the additional aa sequences obtained from clonescPj-1-4 did not reveal any significant homologies to known proteins. This collection of clonessuggests that alternate 3’ splicing is occurring during the expression of the FLT locus andthat at least one of the resulting transcripts is polyadenylated within an LTR.Translation of the different FLT transcripts corresponding to the cDNA clonesshown in Fig. 5-8 would result in related proteins with highly variable C-terminal ends.Interestingly, similar findings have been reported for the pregnancy-specific 31 glycoproteingenes (FSI3G). The PSI3G genes code for a set of proteins of unknown function which aresynthesized in large amounts by the syntiotrophoblast during pregnancy (Bohn and Dati,1983). Some of the heterogeneity of PSG is due to alternate splicing and differential usageof polyadenylation signals that results in proteins which are identical except for differingshort carboxyl ends (Streydio et al., 1990). It is intriguing that the expression of thenonhomologous FLT gene reported here also occurs in placenta and also appears to involvealternate 3’ splicing and polyadenylation events. The significance of this complexexpression pattern to the possible functions of PS 13 G or PLT remains to be determined.In conclusion, I have shown that RTVL-H LTRs can serve as polyadenylationsignals for non-RTVL-H initiated cellular transcripts. In general, such chimeric transcriptscould represent genes in which an LTR serves as the natural polyadenylation signal or theycould result from relatively recent rearrangements or insertion events that may lead topolymorphisms or genetic abnormalities. In this study, Northern and PCR analysissuggests that the PLT-LTR transcript is present in placental tissue from differentindividuals. In addition, PCR has confirmed the presence of the LTR in all human and Old180World primate DNAs examined. These results indicate that the LTR is a natural part of theFLT locus and is involved in the normal expression of the FLT gene by providing analternate polyadenylation signal. Sequence analysis of the PLT ORF did not revealsignificant homologies to known proteins so the possible function of this novel gene isunknown. However, its expression in primary placental tissue and differential expression invarious cell lines, along with the conservation of related sequences in the mouse genome,suggests that this gene is of functional significance. These results demonstrate that anendogenous retroviral LTR has donated a polyadenylation signal to an unrelated cellulargene. These findings raise the possibility that endogenous LTRS belonging to the manyfamilies of human retrovirus-like sequences could play a functional role in the 3’ RNAprocessing of a variety of human genes.181ChAPTER \\T1SUMMARYANDPERSPECTWESTransposable elements comprise 10% of the human genome. While such sequencesare recognized as a major source of genetic variability and mutation in lower eukaryotes,their importance in the human genome is only now becoming appreciated. Transposableelements represent a potential source of mutation through new integrations and there havebeen several reports in man of diseases resulting from de novo insertions of both Li andAlu elements. Also, the fact that transposable elements represent a pooi of repeatedsequences dispersed throughout the genome allows for the possibility of homologousrecombination between elements, leading to deletion or duplication of the interveningsequences. This can result in disease, but it can also serve as a source of duplicate and thusnonessential genetic material upon which evolution can work. The presence of atransposable element can also affect the expression of an adjacent cellular gene whichagain may lead to disease or, over evolutionary time, may come to be assimilated into thenormal regulation of that gene. Thus, transposable elements can have a significant impacton the human genome. The purpose of the work reported in this thesis was to examine theimpact that a family of endogenous retroviral sequences, the RTVL-H family, could havehad and may still have on the human genome.In Chapter III, I examined the evolutionary history of the RTVL-H family in theprimate genome. The RTVL-H family is unusual in that it is present in the human genomein a much higher copy number than most other HERV families. Most HERV families havecopy numbers ranging from 1 to 100. In contrast, the RTVL-H family consists of 1000elements plus a similar number of solitary LTRs. I have found that the major amplificationof RTVL-H elements occurred early in primate evolution, in a common ancestor to the OldWorld primates, after the lineage diverged from that of New World monkeys. The genomesof humans, apes and Old World monkeys each contain similar numbers of RTVL-Helements. The majority of elements share four common deletions within the pol region andlack an enu domain. However, there are 50-100 elements, termed RTVL-Hp, that containone or more of these poi deletions andlor the env domain. In contrast, marmoset, a New183World monkey, contains only —50 deleted elements and a handful of RTVL-Hp elements.The copy number of solitary LTRs in marmoset has not been investigated.I have also identified a second, more moderate, expansion of RTVL-H elements thatoccurred in a common ancestor to the apes. This amplification involved deleted elementsand was associated with a novel LTR subtype. We currently recognize three LTR subtypesbased upon differences in the repeat structure of the U3 region. Types I and II areevolutionarily older, being found in similar copy numbers in all Old World primatesexamined. The Type Ta LTR is present in significant numbers (—100 copies) only in thegreat apes. Sequence comparisons of the LTR subtypes have suggested that Type Ia arosethrough a recombination event involving a Type I and a Type II LTR.It is interesting to speculate on the reasons for RTVL-H expansion. For theexpansion of Type Ta elements in the great apes, we speculate that the novel repeatstructure created in the Type Ta LTR may have allowed this element to escape theregulatory control of the host cell. This notion is supported by the observation that Type Taelements show a less restricted pattern of expression, being expressed in a wider range ofcell types than Type I and Type TI elements.The original expansion of deleted elements in the Old World primate ancestor mayalso have been associated with alterations to the LTR. The LTR subtypes were definedusing LTR sequences of deleted elements. However, these three subtypes recognize only—85% of all RTVL-H elements. It would be of interest to determine the LTR sequences ofRTVL-Hp elements, especially those containing both an intact pol and an env domain. It ispossible that sequence changes in the LTRS or elsewhere in the element, occurring alongwith the deletions in the pot and env regions, allowed a precursor RTVL-H element(s) toexpand to high copy number. Equally important would be to examine the LTRs of theRTVL-H related sequences in New World monkeys since these elements also have notundergone any significant amplification. It is the LTRs and the internal sequences adjacentto the LTRs that are critical for the transcriptional and transpositional competence of anelement. Translationally incompetent elements can still transpose if protein products are184supplied in trans, either from a rare, translationally competent RTVL-Hp element or fromother sources such as other HERVs or Li elements.An alternate explanation for the original RTVL-H expansion is that it wasassociated with the evolutionary divergence of the Old and New World lineages. There hasbeen much speculation that transposable elements may be involved in speciation events,speculation supported by the observation that transposable elements can cause geneticchange and tend to be activated in response to environmental and genomic stress. RTVL-Hamplification in the Old World primate ancestor may have been a consequence of thisdivergence or may have even contributed to it. However, the observation that other HERVfamilies, also present early in primate evolution, did not undergo any significant expansiondoes not support this as the sole explanation for RTVL-H expansion.The question of when the RTVL-H progenitor sequence entered the primate genomehas not been entirely answered. The RTVL-H family is considered to be primate specificand in Chapter III, I presented data showing that RTVL-H sequences have been present inthe primate genome for at least 45 MYrs, before the divergence of Old World and NewWorld primates. However, I do have preliminary data from hybridization studies, usingseveral RTVL—H probes and done at moderate stringency, suggesting the presence of one ormore RTVL-H related sequences in dog DNA. More hybridization studies need to beperformed with prosimian and other closely related, non-primate DNAs to determine whenRTVL-H sequences appeared in the mammalianlprimate genome.The structure of RTVL-H elements and their distribution in the genome suggestthat these elements amplified via retrotransposition. In Chapter IV, I provided evidence forthis through the identification of an RTVL-H element in the orangutan genome thatappears to represent the retrotransposition of a spliced RTVL-H transcript. I alsoattempted to develop an assay to detect RTVL-H retrotransposition within an experimentaltime frame. The most promising strategy involved the use of a retrotransposition indicatorgene. The neo gene is interrupted by an antisense intron and then inserted, in the reverseorientation, into a retroviral construct. Splicing of the intron will occur only during the185processing of the larger LTR-promoted transcript. Reverse transcription and integration ofthe processed retroviral RNA restores the neo coding sequence, thereby allowing for thedirect selection of cells having experienced a transposition event. Outlined in Chapter Ware some of the problems I encountered trying to develop this assay for an RTVL-H donorelement, after successfully testing the ‘neo-int indicator cassette using a standardretroviral vector. I was not able to get the RTVL-Hlneo-int construct stably introduced intotwo different human cell lines and at present I have no explanation. It is possible that theconstruct is recombining with endogenous elements with loss of the selectable marker.However, similar experiments with TAP constructs transfected into mouse cell linescontaining equivalent lAP copy numbers and expression levels to RTVL-H elements in thecell lines I am using encountered no such problems. It is also possible that the construct isintegrating but is not coming under the same type of regulatory control as endogenouselements, the result being a high rate of new transpositions by the construct that is provingfatal to the cells. Such a phenomenon was observed with Ty experiments in yeast; however,the Ty construct was being induced to high levels of expression through an attached GALlpromoter. I had no problems getting the same construct stably integrated into a mousepackaging cell line in multiple unrearranged copies. However, I could not get stableexpression of the construct, which is the first requirement for retrotransposition. It may benecessary to attach a strong promoter to the RTVL-H element as was done for Ty, not onlyto get reliable expression but also to guarantee a high level of expression. Boeke et al.,(1985) found that the rate of Ty transposition was correlated with the level of expressionand they detected no transpositions by their construct unless they induced expression. Thismay be a necessary step for detecting RTVL-H transposition especially since we believe thetransposition rate to be very low. Also, for experiments in human cells, there must be asource ofgag and poi proteins to be supplied in trans to the construct. Teratocarcinomacells are a good choice for a host cell in that reverse transcriptase activity has beendocumented in these cells and VLPs have been observed. However, it may still be necessaryto supply an exogenous source ofgag-pol products, ie. through the co-transfection of a186second vector containing for example HTLV gag-pot proteins. Work by Heidmann’s grouphas suggested that the rate of transposition may be directly correlated with the level ofexpression ofga-pô1 proteins.-Retrotransposition is only one mechanism by which RTVL-H elements can amplify.In Chapter IV, I presented evidence for the existence of processed RTVL-H pseudogenes. Aputative spliced element detected in certain human. DNA samples did not possess an intact5’ LTR which suggested that this element may be a reverse transcribed, reintegratedRTVL-H transcript (ie. a pseudogene) and not a retrotransposed copy; retrotranspositionwould have regenerated the LTRs. Interestingly, this element appears to be polymorphic inthe human population suggesting that pseudogene formation is an on-going phenomenon.Three putative spliced elements isolated from gibbon DNA also appeared to lack intact 5’LTRs. Definitive proof that these four elements do indeed represent processed RTVL-Hpseudogenes would require the isolation and characterization of the genomic loci to see ifthese elements have the structural features of RTVL-H transcripts and are flanked byvariable-length target site duplications. It would also be interesting to devise a differentialhybridization strategy, similar to that used in Chapter V, to determine the % of RTVL-Helements in the genome that are actually processed pseudogenes.Another mechanism of RTVL-H amplification would be through the duplication ofthe surrounding genomic region. Several clusters of RTVL-H elements have been identified,on chromosomes ip, 7q and lip. It would be interesting to see if any of these clusters areassociated with amplified genomic DNA. One approach would be to isolate probes specificfor sequences flanking the RTVL-H elements and do Southern analyses to see is the probesyielded one or more bands.Lastly, in Chapter V, I examined the impact that an RTVL-H element can have onthe expression of adjacent cellular sequences. Previous work by our laboratory hadidentified cellular sequences apparently being promoted by an RTVL-H LTR. In Chapter V,I provided evidence for two clones, isolated from a human placental eDNA library,containing non-RTVL-H related cellular sequences that have been polyadenylated within187an RTVL-H LTR. One of these clones, cPj-LTR, contains an ORF of 223 aa. Thecorresponding gene, termed FLT, appearsto be a single, multi-exon locus and hasapparently been evolutionarily conserved. Several PLT-related transcripts were detectedthat are differentially expressed in different cell types. Evidence suggested that PLT mRNAundergoes alternative splicing at its 3’ end, with polyadenylation within an RTVL-H LTRoccurring in one of the resulting transcripts. The function of the PLT gene and thebiological relevance, if any, of the PLT-LTR fusion transcript are not known. However,these results do demonstrate that RTVL-H LTRs can affect the expression of adjacentcellular genes.188REFERENCESAboud, M., Rosner, M., Dombrovsky, A., Tatyana, R., Feldman, G., Tolpolar, L., Strilitz-Hassan, Y.,and Flugel, R.M. 1992. interactions between retroviruses and environmental carcinogens andtheir role in animal and human leukemogenesis. Leuk. Res. 16: 1061-1069.Adachi, M., Watanabe-Fukunaga, R. and Nagata, S. 1993. Aberrant transcription caused by theinsertion of an early transposable element in an intron of the Fas antigen gene of lpr mice. Proc.Nati. Acad. Sci. USA 90: 1756.Adam, M.A. and Miller, A.D. 1988. Identification of a signal in a murine retrovirus that is sufficientfor packaging of nonretroviral RNA into virions. J. Virol. 62: 3802-3806.Adams, J.W., Kaufman, R.E., Kretsclimer, P.J., Harrison, M. and Nienhuis, A.W. 1980. A family oflong reiterated DNA sequences, one copy of which is next to the human beta-globin gene. Nucl.Acids Res. 8: 6113-6128.Adler, A.J., Scheller, A., Hoffman, Y. and Robins, D.M. 1991. Multiple components of a complexandrogen-dependent enhancer. Mol. Endo. 5: 1587-1596.Aiyar, A., Cobrinik, D., Ge, Z., Kung, H.J. and Leis, J. 1992. Interaction between retroviral U5 viralRNA and the TPC loop of the tRNAtrp primer are required for efficient initiation of reversetranscritpion. J. Virol. 66: 2464-2472.Anderson, G.R., Stoler, D.L., Scott, J.P. and Farkas, B.K. 1988. Induction of VL3O elementexpression as a response to anoxic stress, j: Banbury reports 30: Transposable elements asmutagenic agents. Cold Spring Harbor Laboratory, Cold Sprong Harbor, New York. pp. 265-274.Andrews, P.W. 1984. Retinoic acid induces neuronal differentiation of cloned human embryonalcarcinoma cell line in vitro. Devel. Biol. 103: 285-293.Andrews, P.W., Damjanov, I., Simon, D., Banting, G.S., Carlin, C., Dracopoli, N.C. and Fogh, J.1984. Pluripotent embryonal carcinoma clones derived from the human teratocarcinoma cell lineTera-2. Lab. Invest. 50: 147-162.Aota, S., Gojobor, T., Shigesada, K., Ozeki, H. and Ikemura, T. 1987. Nucleotide sequence andmolucular evolution of mouse retrovirus-like lAP elements. Gene. 56: 1-12.Armentano, D., Yu, S-F., Kantoff, P.W., von Ruden, T., Anderson, W.F. and Gilboa, E. 1987. Effectof internal viral sequences on the utility of retroviral vectors. J. Virol. 61: 1647-1650.Armour, J., Wong, Z., Wilson, V., Royle, N. and Jeffreys, A. 1989. Sequences flanking the repeatarrays of human mini satellites: association with tandem and dispersed repeat elements. Nucl.Acids Res. 17: 4925-4935.Ashworth, A., Skene, B., Swift, S. and Lovell-Badge, R. 1990. Zfa is an expressed retroposon derivedfrom an alternative transcript of the Zfx gene. EMBO J. 9: 1529-1534.Banki, K., Maceda, J., Hurley, E., Ablonczy, E., Mattson, D., Szegedy, L., Hung, C. and Perl, A.1992. Human T-cell lymphotropic virus (HTLV)-related endogenous sequence, HRES-1, encodesa 28 kDa protein: a possible autoantigen for HTLV-I gag-reactive autoantibodies. Proc. Natl.Acad. Sci. USA 89: 1939-1943.Banville, D. and Boie, Y. 1989. Retroviral long terminal repeat is the promoter of the gene encodingthe tumor-associated calcium-binding protein oncomodulin in the rat. J. Mol. Biol. 207: 481-490.189Barsh, G.S., Seeburg, P.H. and Gelinas, R.E. 1983. The human growth hormone gene family:structure and evolution of the chromosomal locus. Nuci. Acids Res. 11: 3939-3958.Batzer, M.A. and Deininger, P.L. 1991, A human-specific subfamily of Alu sequences. Genomics 9:481-487.Baumruker, T., Gene, C. and Horak, I. 1988. Insertion of a retrotransposon within the 3’ end of amouse gene provides a new functional polyadenylation signal. Nuci. Acids Res. 16: 7241-7251.Begun, D.R. 1992. Miocene fossil hominids and the chimp-human dade. Science 257: 1929-1933.Bender, M.A., Palmer, T.D., Gelinas, R.E. and Miller, D.A. 1987. Evidence that the packagingsignal of Moloney murine leukemia virus extends into the gag region. J. Virol. 61: 1639-1646.Berg, D.E. and Howe, M.M. (Eds): Mobile DNA. American Society for Microbiology, WashingtonD.C., 1989.Biessman, H., Mason, J.M., Ferry, C., d’Hulst, M., Valgeirsdottir, K., Traverse, K.L. and Pardue,M.L. 1990. Addition of telomere-associated HeT DNA sequences ‘heals’ broken èhromosome endsin Drosophila. Cell 61: 663-673.Bird, A. 1992. The essentials of DNA methylation. Cell 70: 5-8.Boeke, J.D. 1989. Transposable elements in Saccharomyces cerevisiae. In: Mobile DNA, Berg, D.and Howe, M. (Eds.). American Society for Microbiology, Washington, D.C., pp. 335-374.Boeke, J.D. and Corces, V.G. 1989. Transcription and reverse transcription of retrotransposons.Annu. Rev. Microbiol. 43: 403-434.Boeke, J.D., Garfinkel, D.J., Styles, C.A. and Fink, G.R. 1985. Ty elements transpose through anRNA intermediate. Cell 40: 491-500.Boer, P.H., Adra, C.N., Lau, Y.-F. and McBurney, M.W. 1987. The testis-specific phosphoglyceratekinase gene pgk-2 is a recruited retroposon. Mol. Cell. Biol. 7: 3107-3112.Bogenhagen, D.F., Sakonju, S. and Brown, D.D. 1980. A control region in the center of the 5S RNAgene directs specific initiation of transcription. II. The 3’ border of the region. Cell 19: 27-35.Bohn, D. and Dati, F. 1983. Placental and pregnancy-related proteins. iii: Ritzmann, S.E. andKillingsworth, L.M. (Eds.), Proteins in Body Fluids, Amino Acids and Tumor Markers:Diagnostic and Clinical Aspects. A.R. Liss, New York, pp. 333-374.Boiler, K., Konig, H., Sauter, M., Mueller-Lantzsch, N., Lower, R., Lower, J. and Kurth, R. 1993.Evidence that HERV-K is the endogenous retrovirus sequence that codes for the humanteratocarcinoma derived retrovirus HTDV. Virology 196: 349-353.Bonner, T., O’Connell, C. and Cohen, M. 1982. Cloned endogenous retroviral sequences from humanDNA. Proc. Natl. Acad. Sci. USA 79: 4709-4713.Brack-Werner, R., Barton, D., Werner, T., Foellmer, B., Leib-Mosch, C., Francke, U., Erfie, V. andHehlmann, R. 1989. Human SSAV-related endogenous retroviral element: LTR-like sequenceand chromosomal localization to 18q21. Genomics 4: 68-75.Brack-Werner, R., Sander, I., Leib-Mosch, C., Ohlmann, M., Erfie, V. and Werner, T. 1993.Endogenous retroviral elements as tools for phylogenetic analysis. J. Cancer Res. ClinicalOncology 119 (Suppl. 1): S4.190Brenner, C.A., Tam, A.W., Nelson, P.A., Engleman, E.G., Suzuki, N., Fry, K.E. and Larrick, J.W,1989. Message amplification phenotyping (MAPPing): a technique to simultaneously measuremultiple mRNAs from small numbers of cells. Biotechniques 7: 1096-1103.Brini, A.T., Lee, G.M. and Kinet, J.-P. 1993. Involvement of Alu sequences in the cell-specificregulation of transcription of the y chain of Fc and T cell receptors. J. Biol. Chem. 268: 1355-1361.Britten, R.J., Baron, W.F., Stout, D.B. and Davidson, E.H. 1988. Sources and evolution of humanAlu repeated sequences. Proc. Nati. Acad. Sci. USA 85: 4770-4774.Brooks-Wilson, A.R., Goodfellow, P.N., Povey, S., Nevanlinna, H.A., de Jong, P.T., Goodfellow, P.J.1990. Rapid cloning and characterization of new chromosome 10 DNA markers by Mu element-mediated PCR. Genomics. 7: 617-620.Brosius, J. 1991. Retrotransposons- seeds of evolution. Science 251: 753.Brown, P.O. 1990. Integration of retroviral DNA. Curr. Top. Microbiol. Immunol. 157: 19-48.Bucheton, A. 1990. I transposable elements and I-R hybrid dysgenesis in Drosophila. Trend Genet.6: 16-21.Burton, F.H., Loeb, D.D., Voliva, C.F., Martin, S.L., Edgell, M.H. and Hutchison, C.A. 1986.Conservation throughout mammalia and extensive protein-encoding capacity of the highlyrepeated DNA long interspersed sequence one. J. Mol, Biol. 187: 291-304,Callahan, R., Drohan, W., Tronick, S. and Schiom, J. 1982. Detection and cloning of human DNAsequences related to the mouse mammary tumor virus genome. Proc. Natl, Acad. Sci. USA 79:5503-5507.Callahan, R., Chui, I.-M., Tronick, S., Roe, B., Aaronson, S. and Schiom, J. 1985. A new class ofendogenous human retroviral genornes. Science 228: 1208-1211.Canaani, E., Dreazen, 0., KIar, A., Rechavi, G., Ram, D., Cohen, J.B. and Givol, D. 1983.Activation of the c-mos oncogene in a mouse plasmacytoma by insertion of an endogenousintracisternal A-particle genorne. Proc. Natl. Acad. Sci. USA 80: 7118-7122.Casavant, N.C., Hardies, S.C., Funk, F.D., Corner, M.B., Edgell, M.H. and Hutchison, C.A. 1988.Extensive movement of LINE-i sequences in 3-globin loci of Mus caroli and Mus domesticus.Mol. Cell. Biol. 8: 4669-4674.Cavalier-Smith, T. 1978. Nuclear volume control by nucleoskeletal DNA, selection for cell volumeand cell growth rate, and the solution of the DNA C-value paradox. J. Cell Sci. 34: 247-278.Cedar, H. 1988. DNA methylation and gene activity. Cell 53: 3-4.Chang-Yeh, A., Mold, D.E. and Huang, R.C.C. 1991. Identification of a novel rnurine lAP-promotedplacenta-expressed gene. Nucleic Acids Res. 19: 3667-3672.Clark, B.D., Collins, K.L., Gandy, M.S., Webb, A.C. and Auron, P.E. 1986. Genornic sequence forhuman prointerleukin 1 beta: possible evolution from a reverse transcribed prointerleukin 1alpha gene. Nucl. Acids Res. 14: 7897-7914.Cobrinik, D., Aiyar, A., Ge, Z., Katzman, M., Huang, H. and Leis, J. 1991. Overlapping retrovirusUS sequence elements are required for efificient integration and initiation of reversetranscription. J. Virol. 65: 3864-3872.191Cobrinik, D., Soskey, L. and Leis, J. 1988. A retroviral RNA secondary structure required forefficient initiation of reverse transcritpion, J. Virol. 62: 3622-3630.Coffin, J.M, 1992. Genetic diversity and evolution of retroviruses. Curr. Top. Microbiol. Immunol.176: 143-164.Cohen, M., Kato, N. and Larsson, E. 1988. ERV3 human endogenous provirus mRNAs areexpressed in normal and malignant tissues and cells, but not in choriocarcinoma tumor cells. J.Cell. Biochem. 36: 12 1-128.Cohen, M. and Larsson, E. 1988. Human endogenous retroviruses. Bioessays 9: 191-196.Craig, L.C., Ph-tie, I.L., Gracy, RW. and Pirtle, R.M. 1991. Characterization of the transcriptionunit and two processed pseudogenes of chimpanzee triosephosphate isomerase (TPI). Gene 99:217-227.Craigie, R. 1992. Hotspots and warm spots: integration specificity of retroelements. Trends Genet.8: 187-190.Curcio, M.J. and Garfinkel, D.J. 1992. Posttranslational control of Tyl retrotransposition occurs atthe level of protein processing. Mol, Cell. Biol. 12: 2813-2825.Curcio, M.J. and Garfinkel, D.J. 1991. Single-step selection for Tyl element retrotransposition.Proc. Nati. Acad. Sci. USA. 88: 936-940.D’Aquila, R.T., Bechtel, L.J., Videler, J.A., Eron, J.J., Gorczyca, P. and Kaplan, J.C. 1991.Maximizing sensitivity and specificity of PCR by preamplification heating. Nuci. Acids Res. 19:3749.Dahi, H.-H.M., Brown, R.M., Hutchison, W.M., Maragos, C. and Brown, G.K. 1990. A testis-specificform of the human pyruvate dehydrogenase Elxx subunit is coded for by an introniess gene onchromosome 4. Genomics 8: 225-232.Daniels, G.R. and Deininger, P.L. 1985. Repeat sequence families derived from mammalian tRNAgenes. Nature 317: 819-822.Daniels, G.R. and Deininger, P.L. 1991. Characterization of a third major SINE family of repetitivesequences in the galago genome. Nuci. Acids Res. 19: 1649-1656.Daniels, G.R., Fox, M., Lowensteiner, D., Schmid, C. and Deininger, P.L. 1983. Species specifichomogeneity of the primate Mu family of repeated DNA sequences. Nuci. Acids Res. 11: 7579-7593.Deen, K. and Sweet, R. 1986. Murine mammary tumor virus pol-related sequences in human DNA:characterization and sequence comparison with the complete murine mammary tumor virus poigene. J. Virol. 57: 422-432.Deininger, P.L. 1989. SINES: Short interspersed repeat DNA elements in higher eucaryotes. liMobile DNA, Berg. D.E. and Howe. M.M. (Eds). American Society for Microbiology,Washington, D.C. pp. 619-636.Deininger, P.L. and Daniels, G.R. 1986. The recent evolution of mammalian repetitive DNAelements. Trends Genet. 2: 76-80.192Deininger, P.L., Batzer, M.A., Hutchison, C.A. and Edgell, M.H. 1992. Master genes in mammalianrepetitive DNA amplification. Trends Genet. 8: 307-311.Deininger, P.L., Jolly, D.J., Rubin, C.M., Friedmann, T. and Schmid, C.W. 1981. Base sequencestudies of 300 nucleotide renatured repeated human DNA clones. J. Mol. Biol. 151: 17-33.Deka, N., Wong, E., Matera, A., Kraft, R., Leinwand, L. and Schmid, C. 1988. Repetitive nucleotidesequence insertions into a novel calmodulin-related gene and its processed pseudogene. Gene 71:123-134.Deragon, J-M., Sinnett, D. and Labuda, D. 1990. Reverse transcriptase activity from humanembryonal carcinoma cells NTera2Dl. EMBO J. 9: 3363-3368.Desai, S., Kalyanaraman, V., Casey, J., Srinivasan, A., Anderson, P. and Devare, S. 1986. Molecularcloning and primary nucleotide sequence analysis of a distinct human immunodefiency virusisolate reveal significant divergence in its genomic sequence. Proc. NatL Acad. Sci. USA. 83: 380-384.Devereux, J., Haeberli, P. and Smithres, 0. 1984. A comprehensive set of sequence analysisprograms for the VAX. Nuci. Acids Res. 12: 387-395.Dombroski, B.A., Mathias, S.L., Nanthakumar, E., Scott, A.F. and Kazazian, H.H. 1991. Isolation ofan active human transposable element. Science 254: 1805-1808.Dombroski, B.A., Scott, A.F. and Kazazian, H.H. 1993. Two additional potential retrotransposonsisolated from a human Li subfamily that contains an active retrotransposable element. Proc.Natl. Acad. Sci. USA 90: 6513-6517.Doolittle, W.F. and Saprenza, C. 1980. Selfish genes, the phenotype paradigm and genomeevolution. Nature 284: 601-603.Doolittle, R.F., Feng, D-F., Johnson, M.S. and McClure, M.A. 1989. Origins and evolutionaryrelationships of retroviruses. Quart. Rev. Biol. 64: 1-30.Doolittle, R.F. and Feng, D.-F. 1992. Tracing the origin of retroviruses. Curr. Top. Microbiol.Immunol. 176: 195-211.Dunsmuir, P., Brorien, W.J., Simon, M.A. and Rubin, G.M. 1980. Insertion of the Drosophilatransposable element copia generates a 5 base pair duplication. Cell 21: 575-579.Echalier, G. 1989. Drosophila retrotransposons: interactions with genome. Adv. Virus Res. 36: 33-105.Eibel, H., Gafner, J., Stotz, A. and Philippsen, P. 1981. Characterization of the yeast mobile geneticelement Tyl. Cold Spring Harbor Symp. Quant. Biol. 45: 609-617.Emanoil-Ravier, R., Mercier, G., Canivet, M., Garcette, M., Lasneret, J., Peronnet, F., BestBelpomme, M. and Peries, J. 1988. Dexamethasone stimulates expression of transposable Type Aintracisternal retroviruslike genes in mouse (Mus musculus) cells. J. Virol. 62: 3867-3869.Emi, M., Horii, A., Tomita, N., Nishide, T., Ogawa, M., Mori, T. and Matsubara, K. 1988.Overlapping two genes in human DNA: a salivary amylase gene overlaps with a gamma-actinpseudogene that carries an integrated human endogenous retroviral DNA. Gene 62: 229-235.Erickson, L.M., Kim, H.S. and Maeda, N. 1992. Junctions between genes in the haptoglobin genecluster of primates. Genomics 14: 948-958.193Evans, J.P and Palmiter, R.D. 1991. Retrotransposition of a mouse Li element. Proc. Natl. Acad.Sci. USA 88: 8792-8795.Faff, 0., Murray, A.B., Schmidt, J., Leib-Mosch, C., Erfie, V. and Hehlmann, R. 1992. Retroviruslike particles from the human T47D cell line are related to mouse mammary tumor virus and areof human endogenous origin. J. Gen. Virol. 73: 1087-1097.Fanning, T. and Singer, M. 1987. The LINE-i DNA sequences in four mammalian orders predictproteins that conserve homologies to retrovirus proteins. Nuci. Acids Res. 15: 2251-2260.Favor, J. and Morawetz, C. 1992. Insertional mutations in mammals and mammalian cells. Mut.Res. 284: 53-74.Feenstra, A., Fewell, J., Kuff, E.L. and Lueders, K. 1986. In vitro methylation inhibits the promoteractivity of a cloned intracisternal A-particle LTR. Nucl. Acids Res. 14: 4343-4352.Feinberg, A.P. and Vogeistein, B. 1983. A technique for radiolabelling DNA restriction endonucleasefragments to high specific activity. Anal. Biochem. 132: 6-13.Feuchter, A. 1991. Retrovirus-like promoters in the human genome. Ph.D. dissertation. Universityof British Columbia, Vancouver, B.C., Canada.Feuchter, A. and Mager, D. 1990. Functional heterogeneity of a large family of human LTR-likepromoters and enhancers. Nuci. Acids. Res. 18: 1261-1270.Feuchter, A. and Mager, D. 1992. SV4O large T antigen trans-activates the long terminal repeats ofa large family of human endogenous retrovirus-like sequences. Virology 187: 242-250.Feuchter, A.E., Freeman, J.D. and Mager, D.L. 1992. Strategy for detecting cellular transcriptspromoted by human endogenous long terminal repeats: Identification of a novel gene (CDC4L)with homology to yeast CDC4. Genomics 13: 1237-1246.Feuchter-Murthy, A.E., Freeman, J.D. and Mager, D.L. 1993. Splicing of a human endogenousretrovirus to a novel phospholipase A2 related gene. Nucl. Acids Res. 21: 135-143.Fields, C.A., Grady,D.L. and Moyzis, R.K. 1992. The human THE-LTR(0) and MstII interspersedrepeats are subfamilies of a single widely distributed highly variable repeat family. Genomics 13:431-436.Fink, G.R. 1988. Pseudogenes in yeast? Cell 49: 5-6.Finnegan, D.J. i989a. Eukaryotic transposable elements and genome evolution. Trends Genet. 5:103-107.Finnegan, D.J. 1989b. The I factor and I-R hybrid dysgenesis in Drosophila melanogaster. Ifl:Mobile DNA, Berg, D.E. and Howe, M.M. (Eds). American Society for Microbiology, Washington,D.C. pp. 503-517.Fitch, D., Bailey, W., Tagle, D., Goodman, M., Sieu, L. and Slightom, J. 1991. Duplication of the 6-globin gene mediated by Li long interspersed repetitive elements in an early ancestor of simianprimates. Proc. Nati. Acad. Sci. USA. 88: 7396-7400.Fontdevila, A. 1992. Genetic instability and rapid speciation: are they coupled? Genetica 86: 247-258.194Fourel, G., Trepo, C., Bougueleret, L., Henglein, B., Ponzetto, A., Tiollais, P. and Buendia, M.-A.1990. Frequent activation of N-myc genes by hepadnavirus insertion in woodchuck livertumours. Nature 347: 294-298.Franklin, G., Chretien, S., Hanson, I., Rochefort, H., May, F. and Westley, B. 1988. Expression ofhuman sequences related to those of mouse mammary tumor virus. J. Virol. 62: 1203-1210.Fraser, C., Huinphries, R. and Mager, D. 1988. Chromosomal distribution of the RTVL-H family ofhuman endogenous retrovirus-like sequences. Genomics 2: 280-287.Fredhoim, M., Policastro, P.F. and Wilson, M. 1991. The dispersion of defective endogenous murineretroviral elements suggests retrotransposition-mediated amplification. DNA Cell Biol. 10: 713-722.Galas, D.J. 1990. Transposable genetic elements: agents of complex change. Ill: Mutation and theEnvironment, Part A. Wiley-Liss, Inc. pp. 135-144.Garfinkel, D.J. 1992. Retroelements in microorganisms. In: The Retroviridae, Volume 1, Levy, J.A.(Ed.), Plenum Press, New York. pp. 107-158.Garfinkel, D.J., Boeke, J.D. and Fink, G.R. 1985. Ty element transposition: reverse transcriptaseand virus-like particles. Cell 42: 507-517.Garfinkel, D.J., Hedge, A.-M., Youngren, S.D. and Copeland, T.D. 1991. Proteolytic processing ofpol-TYB proteins from the yeast retrotransposon Tyl, J. Virol. 65: 4573-4581.Garry, R. 1990. Extensive antigenic mimicry by retrovirus capsid proteins. AIDS Res. Hum.Retroviruses 6: 1361-1362.Gattoni-Celli, S., Kirsch, K., Kalled, S. and Isselbacher, K. 1986. Expression of type C-relatedendogenous retroviral sequences in human colon tumors and colon cancer cell lines. Proc. Nati.Acad. Sci: USA 83: 6127-6131.Gaubatz, J.W., Arcement, B. and Cutler, R.G. 1991. Gene expression of an endogenous retroviruslike element during murine development and aging. Mech. Aging Develop. 57: 71-85.Gillin, F.D., Roufa, D.J., Beaudet, A.L. and Caskey, C.T. 1972. 8-Azaguanine resistance inmammalian cells I. Hypoxanthine-guanine phosphoribosyltransferase. Genetics. 72: 239-252.Gingerich, P.D. 1984. Primate evolution: evidence from the fossil record, comparative morphology,and molecular biology. Year Bk. Phys. .Anthrop. 27: 57-72.Gojobori, T. and Yokoyama, 5. 1985. Rates of evolution of the retroviral oncogene of Moloneymurine sarcoma virus and of its cellular homologues. Proc. Nati. Acad. Sci. USA. 82: 4198-4201.Goldberg, Y.P., Rommens, J.M., Andrews, S.E., Hutchinson, G.B., Lin, B., Theilmann, J., Graham,R., Glaves, M.L., Starr, E., McDonald, H., Nasir, J., Schappert, K., Kalchman, M.A., Clarke, L.A.and Hayden, M.R. 1993. Identification of an Alu retrotransposition event in close proximity to astrong candidate gene for Huntingtons disease. Nature 362: 370-373.Goodchild, N.L., Wilkinson, D.A. and Mager, D.L. 1992. A human endogenous long terminal repeatprovides a polyadenylation signal to a novel, alternatively spliced transcript in normal placenta.Gene. 121: 287-294.Goodman, M., Koop, B.F., Czelusniak, J., Fitch, D.H.A., Tagle, D.A. and Slightom, J.L. 1989.Molecular phylogeny of the family of apes and humans. Genome 31: 316-335.195Gorman, C.M., Moffat, L.F. and Howard, B.H. 1982. Recombinant genomes which expresschioramphenicol acetyltransferase in mammalian cells. Mol. Cell. Biol. 2: 1044-1051.Graham, F. and van der Eb, A. 1973. A new technique for the assay of infectivity of humanadenovirus-5 DNA. Virology 52: 456-457.Gruskin, K.S., Smith, T.F. and Goodman, M. 1987, Possible origin of a calmodulin gene that lacksintervening sequences. Proc. Natl. Acad. Sci. USA 84: 1605-1608.Harada, F., Tsukada, N. and Kato, N. 1987. Isolation of three kinds of human endogenousretrovirus-like sequences using tRNAPrO as a probe. Nucl. Acids Res. 15: 9153-9162.Hardies, S.C., Martin, S.L., Voliva, C.F., Hutchison, C.A. and Edgell, M.H. 1986. An analysis ofreplacement and synonymous changes in the rodent Li repeat family. Mol. Biol. Evol. 3: 109-125.Harendza, C.J. and Johnson, L.F. 1990. Polyadenylation signal of the mouse thymidylate synthasegene was created by insertion of an Li repetitive element downstream of the open reading frame.Proc. Natl. Acad. Sci. USA. 87: 2531-2535.Hartig, E., Nierlich, B., Mink, S., Nebl, G. and Cato, A.C. 1993. Regulation of mouse mammarytumor virus through sequences located in the hormone response element: involvement of cell-cellcontact and a negative regulatory factor. J. Virol. 67: 813-821.Hattori, M., Kuhara, S., Takenaka, 0. and Sakaki, Y. 1986. Li family of repetitive DNA sequencesin primates may be derived from a sequence encoding a reverse transcriptase-related protein.Nature 321: 625-628.Hawley, R.G., Shulman, M.J. and Hozumi, N. 1984. Transposition of two different intracisternal Aparticle elements into an immunoglobulin kappa-chain gene. Mol. Cell. Biol. 4: 2565-2572.Heidmann, T., Heidmann, 0. and Nicolas, J-F. 1988. An indicator gene to demonstrate intracellulartransposition of defective retroviruses. Proc. Natl. Acad. Sci. USA 85: 2219-2223.Heidmann, 0. and Heidmann, T. 1991. Retrotransposition of a mouse TAP sequence tagged with anindicator gene. Cell 64: 159-170.Hellmann-Blumberg, U., McCarthy-Hintz, M.F., Gatewood, J.M. and Schmid, C.W. 1993.Developmental differences in methylation of human Mu repeats. Mol. Cell. Biol. 13: 4523-4530.Henthorn, P.S., Knoll, B.J., Raducha, M., Rothblum, K.N., Slaughter, C., Weiss, M., Lafferty, M.A.,Fischer, T. and Harris, H. 1986. Products. of two common alleles at the locus for human placentalalkaline phosphatase differ by seven amino acids. Proc. Nati. Acad. Sci. USA 83: 5597-5601.Henthorn, P., Zervos, P., Raducha, M., Harris, H. and Kadesch, T. 1988. Expression of a humanplacental alkaline phosphatase gene in transfected cells: use as a reporter for studies of geneexpression. Proc. Natl. Acad. Sci. USA. 85: 6342-6346.Higuchi, R. 1990. Recombinant PeR. j.3: PCR Protocols: A Guide to Methods and Applications.Innis, M.A., Gelfand, D.H., Sninsky, J.J. and White, T.J. (Eds.), Academic Press, New York, pp.177-183.Hirose, Y., Takamatsu, M. and Harada, F. 1993. Presence of enu genes in members of the RTVL-Hfamily of human endogenous retrovirus-like elements. Virology 192: 52-61.196Hobbs, H.H., Brown, M.S., Goldstein, JL. and Russell, S.W. 1986. Deletion of exon encodingcysteine rich repeat of low density lipoprotein receptor alters its binding specificity in a subjectwith familial hypercholesterolemia, J. Biol. Chem. 261: 13144-13120.Holmes, S.E., Singer, M.F. and Swergold, G.D. 1992. Studies on p40, the leucine zipper motif-containing protein encoded by the first open reading frame of an active human LINE-itransposable element. J. Biol. Chem. 267: 19765-19768.Horn, T., Huebner, K., Croce, C. and Callahan, R. 1986. Chromosomal locations of members of afamily of novel endogenous human retroviral genomes. J. Virol. 58: 955-959.Horowitz, M., Luria, S., Rechavi, G. and Givol, D. 1984. Mechanism of activation of the mouse c-mosoncogene by the LTR of an intracisternal A- particle gene. EMBO J. 3: 2937-2941.Horsthemke, B., Beisiegel, U., Dunning, A., Havinga, J.R., Williamson, R. and Humphries, S. 1987.Unequal crossing-over between two Alu-repetitive DNA sequences in the low-density-lipoproteinreceptor gene. Eur. J. Biochem. 164: 77-81.Houck, C.M., Rinehart, F.P. and Schmid, C.W. 1979. A ubiquitous family of repeated DNAsequences in the human genome. J. Mol. Biol. 132: 289-306.Howe, C.C. and Overton, G.C. 1986. Expression of the intracisternal A-particle is elevated duringdifferentiation of embryonal carcinoma cells. Mol. Cell. Biol. 6: 150-157.Hsiao, W.L., Gattoni-Celli, S., and Weinstein, I.B. 1986. Effects of 5-azacytidine on the expression ofendogenous retrovirus-related sequences in C3H 1OT1/2 cells. J. Virol. 57: 1119-1126.Hu, W-S. and Temin, H.M. 1990. Retroviral recombination and reverse transcription. Science 250:1227-1233.Hull, R. and Will, H. 1989. Molecular biology of viral and nonviral retroelements. Trends Genet. 5:357-359.Hutchinson, G.B., Andrew, S.E., McDonald, H., Goldberg, Y.P., Graham, R., Rommens, J.M. andHayden, M.R. 1993. An Alu element retroposition in two families with Huntington diseasedefines a new active Alu subfamily. NucI. Acids. Res. 21: 3379-3383.Hutchison, C.A., Hardies, S.C., Loeb, D.D., Shehee, W.R. and Edgell, M.H. 1989. LINES andrelated retroposons: long interspersed repeated sequences in the encaryotic genome. MobleDNA, Berg, D.E. and Howe, M.M. (Eds). American Society for Microbiology, Washington, D.C.pp. 593-617.Hwu, H.R., Roberts, J.W., Davidson, E.H. and Britten, R.J. 1986. Insertion and/or deletion of manyrepeated DNA sequences in human and higher ape evolution. Proc. Nati Acad. Sci. USA 83:3875-3879.Innis, M.A., Gelfand, D.H., Sninsky, J.J. and White, T.J. (Eds). PCR Protocols: A Guide toMethods and Applications. Academic Press, New York, 1990.Inouye, S., Yuki, S. and Saigo, K. 1986. Complete nucleotide sequence and genome organization of aDrosophila transposable genetic element, 297. Eur. J. Biochem. 154: 417-425.Isfort, R., Jones, D., Kost, R., Witter, R. and Kung, H.J. 1992. Retrovirus insertion into herpesvirusin vitro and in vivo. Proc. Natl. Acad. Sci. USA 89: 991-995.197Jaenisch, R.A., Schnieke, A. and Harbers, K. 1985. Treatment of mice with 5-azacytidine efficientlyactivates silent retroviral genomes in different tissues. Proc. Nati. Acad. Sci. USA 82: 1451-1455.Jahner, D., Stuhlmann, H., Stewart, C.L., Harbers, K., Lohier, J., Simon, I. and Jaenisch, R. 1982.De novo methylation and expression of retroviral genomes during mouse embryogenesis. Nature293: 370-374.Jenkins, N.A., Copeland, N.G., Taylor, B.A. and Lee, B.K. 1981. Dilute (d) coat colour mutation ofDBAJ2J mice is associated with the site of integration of an ecotropic MuLV genome. Nature 293:370-374.Jensen, S. and Heidmann, T. 1991. An indicator gene for detection of germline retrotransposition intransgenic Drosophila demonstrates RNA-mediated transposition of the LINE I element. EMBOJ. 10: 1927-1937.Ji, H., Moore, D.P., Blomberg, M.A., Braiterman, L.T., Voytas, D.F., Natsoulis, G. and Boeke, J.D.1993. Hotspots for unselected Tyl transposition events on yeast chromosome III are near tRNAgenes and LTR sequences. Cell 73: 1007-1018.Johansen, T., Holm, T. and Bjorklid, R. 1989. Members of the RTVL-H family of human endogenousretrovirus-like elements are expressed in placenta. Gene 79: 259-267.Johnson, G.R., Gonda, T.J., Metcalf, D., Hariharan, I.K. and Cory, S. A lethal myeloproliferativesyndrome in mice transplanted with bone marrow cells infected with a retrovirus expressinggranulocyte-macrophage colony stimulating factor. 1989. EMBO J. 8: 441-448.Johnson, P.M., Lyden, T.W. and Mwenda, J.M. 1990. Endogenous retroviral expression in thehuman placenta. Am. J. Reprod. Immunol. 23: 115-120.Jolly, D.J., Okayama, H., Berg, P., Esty, A.C., Filpula, D., Bohien, P., Johnson, G.G., Shively, J.E.,Hunkapillar, T. and Friedmann, T. 1983. Isolation and characterization of a full-lengthexpressible cDNA for human hypoxanthine phosphoribosyltransferase. Proc. Nati. Acad. Sci.USA. 80: 477-481.Jurka, J. 1989. Subfamily structure and evolution of the human Li family of repetitive sequences,J. Mol. Evol. 29: 496-503.Jurka, J. 1990. Novel families of interspersed repetitive elements from the human genome. Nuci.Acids Res. 18: 137-14 1.Jurka, J. Kaplan, D.J., Duncan, C.H., Walichiewicz, J., Milosavijevic, A., Murali, G. and Solus, J.F.1993. Identification and characterization of new human medium reiteration frequency repeats.Nuci. Acids Res. 21: 1273-1279.Jurka, J. and Milosavljevic, A. 1991. Reconstruction and analysis of human Alu genes. J. Mol. Evol.32: 105-121.Jurka, J. and Smith, T. 1988. A fundamental division in the Alu family of repeated sequences. Proc.Natl. Acad. Sci. USA 85: 4775-4778.Jurka, J., Walichiewicz, J. and Milosavijevic, A. 1992. Prototypic sequences for human repetitiveDNA. J. Mol. Evol. 35: 286-291.Jurka, J. and Zuckerkandl, E. 1991. Free left arms as precursor molecules in the evolution of Alusequences. J. Mol. Evol. 33: 49-56.198Kadesch, T. and Berg, P. 1986. Effects of the position of the simian virus 40 enhancer on expressionof multiple transcription units in a single plasmid. Mol. Cell. Biol. 6: 2593-2601.Kadesch, T., Zervos, P. and Ruezinsky, D. 1986. Functional analysis of the murine IgH enhancer:evidence for negative control of cell type specificity. Nucl. Acids Res. 14: 8209-8221,Kambhu, S., Falldorf, P. and Lee, J. 1990. Endogenous retroviral long terminal repeats within theHLA-DQ locus. Proc. Natl. Acad. Sci. USA 87: 4927-4931.Kannan, P., Buettner, R., Pratt, D. and Tainsky, M. 1991. Identification of a retinoic acid-inducibleendogenous retroviral transcript in the human teratocarcinoma-derived cell line PA-i. J. Virol.65: 6343-6348.Kaplan, D.J., Jurka, J., Solus, J.F. and Duncan, C.H. 1991. Medium reiteration frequency repetitivesequences in the human genome. Nucl. Acids Res. 19: 4731-4738.Kato, N., Pfeifer-Ohlsson, S., Kato, M., Larsson, E., Rydriert, J., Ohisson, R. and Cohen, M. 1987.Tissue-specific expression of human provirus ERV3 mRNA in human placenta: two of the threeERV3 mRNAs contain human cellular sequences. J. Virol. 61: 2182-2191.Kato, N., Larsson, E. and Cohen, M 1988. Absence of expression of a human endogenous retrovirusis correlated with choriocarcinoma. Tnt. J. Cancer 41: 380-385.Kato, N., Shimotohno, K., VanLeeuwen, D. and Cohen, M. 1990. Human proviral mRNAs downregulated in choriocarcinoma encode a zinc finger protein related to Kruppel. Mol. Cell. Biol. 10:4401-4405.Kazazian, H.H., Wong, C., Youssoufian, H., Scott, A.F., Phillips, D.G. and Antonarakiz, S.E. 1988.Haemophilia A resulting from de novo insertion of Li sequences represents a novel mechanismfor mutation in man. Nature 332: 164-166.Keydar, I., Ohno, T., Nayak, R., Sweet, R., Simoni, F., Weiss, F., Karby, S., Mesa-Tejada, R. andSpiegelman, 5. 1984. Properties of retrovirus-like particles produced by a human breastcarcinoma cell line: immunological relationship with mouse mammary tumor virus proteins.Proc. Nati. Acad. Sci. USA 81: 4188-4192.Kikuchi, Y., Ando, Y. and Shiba, T. 1986. Unusual priming mechanism of RNA-directed DNAsynthesis in copia retrovirus-like particles of Drosophila. Nature 323: 824-826.Kim, J., Yu, C.H., Bailey, A., Hardison, R. and Shen, C.K.J. 1989. Unique sequence organizationand erythroid cell-specific nuclear factor-binding of mammalian 01 globin promoters. Nuci. AcidsRes. 17: 5687-5701.King, W., Patel, M.D., Lobel, L.I., Goff, S.P. and Nguyen-Huu, M.C. 1985. Insertion mutagenesis ofembryonal carcinoma cells by retroviruses. Science 228: 554-558.Kleinerman, E.S., Lachamn, L.B., Knowles, R.D., Snyderman, R. and Cianciolo, G.J. 1987. Asynthetic peptide homologous to the envelope proteins of retroviruses inhibits monocytemediated killing by inactivating interleukin 1. J. Immunol. 139: 2329.Korbmacher, C., Konig, H., Boller, K., Lower, R., Kurth, R. and Lower, J. 1993. The gag gene of theendogenous retrovirus HERV-K is sufficient for particle production. J. Cancer Res. Clin.Oncology 119 (Suppl. 1): S5.199Kozak, M. 1987. An analysis of 5-noncoding sequences from 699 vetebrate messenger RNAs. Nucl.Acids Res. 15: 8125-8132.Kress, M., Barra, Y., Seidman, J.G., Khoury, G. and Jay, G. 1984, Functional insertion of an Mutype 2 (B2 SINE) repetitive sequence in murine class I genes. Science 226: 974-977.Krieg, A.M., Gourley, M.F. and Perl, A. 1992. Endogenous retroviruses: potential etiologic agents inautoimmunity FASEB J. 6: 2537-2544.Kroger, B. and Horak, I. 1987. Isolation of novel human retrovirus-related sequences byhybridization to synthetic oligonucleotides complementary to the tRNAPrO primer binding site.J. Virol. 61: 2071-2075.Kuff, E.L. 1988. Factors affecting retrotransposition of intracisternal A-particle proviral elements.: Banbury reports 30: Transposable elements as mutagenic agents. Cold Spring HarborLaboratory, Cold Spring Harbor, New York. pp. 79-89.Kuff E.L. and Lueders, K.K. 1988. The intracisternal A-particle gene family: structure andfunctional aspects. Adv. Cancer Res. 51: 183-276.La Mantia, G., Pengue, G., Maglione, D., Pannuti, A., Pascucci, A. and Lania, L. 1989.Identification of new human repetitive sequences: characterization of the corresponding cDNAsand their expression in embryonal carcinoma cells. Nucl. Acids Res. 17: 5913-5922.La Mantia, G., Maglione, D., Pengue, G., Di Cristofano, A., Simeone, A., Lanfrancone, L. and Lania,L. 1991. Identification and characterization of novel human endogenous retroviral sequencespreferentially expressed in undifferentiated embryonal carcinoma cells. Nucl. Acids Res. 19:1513-1520.La Mantia, 0., Majello, B., Di Cristofano, A., Strazzullo, M., Minchiotti, 0. and Lania, L. 1992.Identification of regulatory elements within the minimal promoter region of the humanendogenous ERV9 proviruses: accurate transcription initiation is controlled by an mr-likeelement. Nucl. Acids Res. 20: 4129-4136.Labuda, D., Sinnett, D., Richer, C., Deragon, J.-M. and Striker, G. 1991. Evolution of mouse Birepeats: 7SL RNA folding pattern conserved. J. Mol. Evol. 32: 405-414.Lamb, B.T., Satyamoorthy, K., Solter, D., Basu, A., Xu, M.Q., Weinmann, R. and Howe, C.C. 1992.A DNA element that regulates expression of an endogenous retrovirus during F9 celldifferentiation is E1A dependent. Mol. Cell. Biol. 12: 4824-4833.Lania, L., Di Cristofano, A., Strazzullo, M., Pengue, G., Majello, B. and La Mantia, 0. 1992.Structure and functional organization of the human endogenous retroviral ERV9 sequences.Virology 191: 464-468.Larsson, E., Anderson, A.C., Holmberg, L., Ohisson, R., Kato, N., Callacio, J. and Cohen, M. 1993.Expression of an endogenous retrovirus, HERV-R, in human tissues. J. Cancer Res. Clin.Oncology 119 (Suppl. 1): S6.Larsson, E., Kato, N. and Cohen, M. 1989. Human endogenous proviruses. Curr. Top. Microbiol.Immunol. 148: 115-132.Lehrman, M.A., Goldstein, J.L., Russell, D.W. and Brown, M.S. 1987. Duplication of seven exons inLDL receptor gene caused by Mu-Mu recombination in a subject with familialhypercholesterolemia. Cell 48: 827-835.200Leib-Mosch, C., Brack, R., Werner, T., Erfie, V. and Hehlmann, R. 1986. Isolation of an SSAVrelated endogenous sequence from human DNA, Virology 155: 666-677.Leib-Mosch, C., Brack-Werner, R., Werner, T., Bachmann, M., Faff, 0., Erfie, V. and Hehlmann, R.1990. Endogenous retroviral elements in human DNA. Cancer Res. 50(S): 5636s-5642s.Leib-Mosch, C., Bachmann, M., Geigl, E.-M., Brack-Werner, R., Werner, T., Erfie, V. andHehlmann, R. 1992. Expression of S71-related sequences in human cells. Haematology and BloodTransfusion 35: 256-259.Leib-Mosch, C., Haltmeier, M., Werner, T., Geigi, E.-M., Brack-Werner, R., Francke, U., Erfie, V.and Hehlmann, R. Genomic distribution and transcription of solitary HERV-K LTRs. Genomics,in press.Leibold, D.M., Swergold, G.D., Singer, M.F., Thayer, R.E., Dombroski, B.A. and Fanning, T.G.1990. Translation of LINE-i DNA elements in vitro and in human cells. Proc. Natl. Acad. Sci.USA 87: 6990-6994.Linial, M.L. and Miller, A.D. 1990. Retroviral RNA packaging: sequence requirements andimplications. Curr. Top. Microbiol. Immunol. 157: 125-152.Liu, A.Y. and Abraham, B.A. 1991. Expression of a hybrid human endogenous retrovirus andcalbindin gene in a prostrate cell line. Cancer Res. 51: 4107-4110.Liu, Q-R. and Chan, P. 1990. Identification of a long stretch of homopurine homopyrimidinesequence in a cluster of retroposons in the human genome. J. Mol. Biol. 212: 453-459.Liu, W-M. and Schmid, C.W. 1993. Proposed roles for DNA methylation in Alu transcriptionalrepression and mutational inactivation. Nucl. Acids Res. 21: 1351-1359.Lloyd, E.A. and Gould, S.J. 1993. Species selection on variability. Proc. Natl. Acad. Sci. USA 90:595-599.Lower, J., Wondrak, E.M. and Kurth, R. 1987. Genome analysis and reverse transcriptase activityof human teratocarcinoma-derived retroviruses. J. Gen. Virol. 68: 2807-2815.Lower, R., Boller, K., Hasenmaier, B., Korbmacher, C., Muller-Lantzsch, N., Lower, J. and Kurth,R. 1993a. Identification of human endogenous retroviruses with complex mRNA expression andparticle formation. Proc. Natl. Acad. Sci. USA 90: 4480-4484.Lower, R., Lower, J., Tondera-Koch, C. and Kurth, R. 1993b. A general method for the identificationof transcribed retrovirus sequences (R-U5 PCR) reveals the expression of the human endogenousretrovirus loci HERV-H and HERV-K in teratocarcinoma cells. Virology 192: 501-511.Luan, D.D., Korman, M.H., Jakubczak, J.L. and Eickbush, T.H. 1993. Reverse transcription ofR2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTRretrotransposition. Cell 72: 595-605.Luciw, P.A. and Leung, N.J. 1992. Mechanisms of retrovirus replication. [ii: The Retroviridae,Volume 1, Levy, J.A. (Ed). Plenum Press, New York. pp 159-298.Lueders, K.K., Fewell, J.W., Kuff, E.L. and Koch, T. 1984. The long terminal repeat of anintracisternal A particle gene functions as a promoter when introduced into eucaryotic cells bytransfection. Mol. Cell. Biol. 4: 2128-2135.201Luria, S. and Horowitz, M. 1986. The long terminal repeat of the intracisternal A particle as atarget for transactivation by oncogene products. J. Virol. 57: 998-1003.McCarrey, J.R. and Thomas, K. 1987. Human testis-specific PGK gene lacks introns and possessescharacteristics of a processed gene. Nature 326: 501-505.McClintock, B. 1952. Chromosome organization and genic expression. Cold Spring Harbour Symp.Quant. Biol. 16: 13-47.McClure, M.A. 1991. Evolution of retroposons by acquisition or deletion of retrovirus-like genes.Mol. Biol. Evol. 8: 835-856.Maeda, N. 1985. Nucleotide sequence of the haptoglobin and haptoglobin-related gene pair. J. Biol.Chem. 260: 6698-6709.Maeda, N. and Kim, H-S. 1990. Three independent insertions of retrovirus-like sequences in thehaptoglobin gene cluster of primates. Genomics 8: 671-683.Mager, D. 1989. Polyadenylation function and sequence variability of the long terminal repeats ofthe human endogenous retrovirus-like family RTVL-H. Virology 173: 591-599.Mager, D.L. and Freeman, D. 1987: Human endogenous retrovirus like genome with type C pot!sequences and gag sequences related to human T-cell lymphotropic viruses. J. Virol. 61: 4060-4066.Mager, D. and Goodchild, N. 1989. Homologous recombination between the LTRs of a humanretrovirus-like element causes a 5-kb deletion in two siblings. Am. J. Hum. Genet, 45: 848-854.Mager, D.L. and Henthorn, P.S. 1984. Identification of a retrovirus-like repetitive element inhuman DNA. Proc. Nati. Acad. Sci. USA 81: 7510-7514.Mager, D., Henthorn, P. and Smithies, 0. 1985. A Chinese y+(yf3)° thalassemia deletion:comparison to other deletions in the human f3-globin gene cluster and sequence analysis of thebreakpoints. Nucl. Acids Res. 13: 6559-6575.Mann, R. and Baltimore, D. 1985. Varying the position of a retrovirus packaging sequence results inthe encapsidation of both unspliced RNAs. J. Virol. 54: 401-407.Maraia, R.J., Chang, D-Y., Wolfe, A.P., Vorce, R.L. and Hsu, K. 1992. The RNA polymerase IIIterminator used by a B1-Alu element can modulate 3’ processing of the intermediate RNAproduct. Mol. Cell. Biol. 12: 1500-1506.Marcu, K.B., Harris, L.J., Stanton, L.W., Erickson, J., Watt, R. and Croce, C.M. 1983.Transcriptionally active c-myc oncogene is contained within NLARD, a DNA sequence associatedwith chromosome translocations in B-cell neoplasia. Proc. Nati. Acad. Sci. USA 80: 519-523.Mariani-Costantini, R., Horn, T. and Callahan, R. 1989. Ancestry of a human endogenous retrovirusfamily. J. Virol. 63: 4982-4985.Markert, M.L., Hutton, J.J., Wiginton, D.A., States, J.C. and Kaufman, R.E. 1988. Adenosinedeaminase (ADA) deficiency due to deletion of the ADA gene promoter and the first exon byhomologous recombination between two Alu elements. J. Clin. Invest. 81: 1323-1327.Markowitz, D., Gof, S. and Bank, A. 1988. A safe packaging line for gene transfer: separating viralgenes on two different plasmids. J. Virol. 62: 1120-1124.202Martin, M., Bryan, T., Rasheed, S. and Khan, A. 1981. Identification and cloning of endogenousretroviral sequences present in human DNA. Proc. Nati. Acad. Sci. USA 78: 4892-4896.Martin, S.L. 1991. Ribonucleoprotein particles with LINE-i RNA in mouse embryonal carcinomacells. Mol. Cell. Biol. 11: 4804-4807.Matera, A.G., Hellmann, U. and Schmid, C.W. 1990. A transpositionally and transcriptionallycompetent Mu subfamily. Mol. Cell. Biol. 10: 5424-5432.Mathias, S.L., Scott, A.F., Kazazian, H.H., Boeke, J.D. and Gabriel, A. 1991, Reverse transcriptaseencoded by a human transposable element. Science 254: 1808-1810.May, F., Westley, B., Rochefort, H., Buetti, E. and Diggelmann, H. 1983. Mouse mammary tumorvirus related sequences are present in human DNA. Nucl. Acids Res. 11: 4127-4139.May, F. and Westley, B. 1986. Structure of a human retroviral sequence related to mouse mammarytumor virus. J. Virol. 60: 743-749.May, F.E.B. and Westley, B.R. 1989. Characterization of sequences related to the mouse mammarytumor virus that are specific to MCF-7 breast cancer cells. Cancer Res. 49:3879-3883.Mermer, B., Colb, M. and Krontiris, T. 1987. A family of short, interspersed repeats is associatedwith tandemly repetitive DNA in the human genome. Proc. Nati. Acad. Sci. USA 84: 3320-3324.Miki, Y., Nishisho, I., Horii, A., Miyoshi, Y., Utsunomiya, J., Kinzler, K., Vogelstein, B. andNakamura, Y. 1992. Disruption of the APC gene by a retrotransposal insertion of Li sequence ina colon cancer. Cancer Res. 52: 643-645.Minakami, R., Kurose, K., Etoh, K., Furuhata, Y., Hattori, M. and Sakaki, Y. 1992. Identification ofan internal cis-element essential for the human Li transcription and a nuclear factor(s) bindingto the element. Nuci. Acids Res. 20: 3139-3145.Misra, R., Matera, A., Schmid, C. and Rush, M. 1989. Recombinãtion mediates production of anextrachromosomal circular DNA containing a transposon-like human element, THE-i. Nuci.Acids Res. 17: 8327-8341.Misra, R., Shth, A., Rush, M., Wong, E. and Schinid, C. 1987. Cloned extrachromosomal circularDNA copies of the human transposable element THE-i are related predominantly to a singletype of family member. J. Mol. Biol. 196: 233-243.Miyamoto, M.M., Slightom, J.L. and Goodman, M. 1987. Phylogenetic relations of humans andAfrican apes from DNA sequences in the -globin region. Science 238: 369-373.Morawetz, C. 1987. Effect of irradiation and mutagenic chemicals on the generation of ADH-2constitutive mutants in yeast: significance for the inducibility of Ty transposition. Mut. Res. 117:53-60.Morse, B., Rotherg, P., South, V., Spandorfer, J. and Astrin, 5. 1988. Insertional mutagenesis of themyc locus by a LINE-i sequence in a human breast carcinoma. Nature 333: 87-90.Mueller-Lantzsch, N., Sauter, M., Weiskircher, A., Kramer, K., Best, B., Buck, M. and Grasser, F.1993. The human endogenous retroviral element KiO (HERV-KiO) encodes for a full length gaghomologous 73 kD protein and a functional protease. AIDS Res. Human Retroviruses 9: 343-351.Muller, F., Laufer, W., Pott, U. and Ciriacy, M. 1991. Characterization of TY1-mediated reversetranscription in Saccharomyces cerevisiae. Mol. Gen. Genet. 226: 145-153.203Muratani, K., Hada, T., Yamamoto, Y., Kaneko, T., Shigeto, Y., Ohue, P., Furuyama, J. andHigashino, K. 1991. Inactivation of the cholinesterase gene by Alu insertion: possible mechanismfor human gene transposition. Proc. Nati. Acad. Sci. USA 88: 11315-11319.Murray, A.W. 1971. The biological significance of purine salvage. Ann. Rev, Biochem. 40: 811-826Mushinski, J.F., Potter, M., Bauer, S.R. and Reddy, E.P. 1983. DNA rearrangement and alteredRNA expression of the c-myb oncogene in mouse plasmacytoid lymphosarcomas. Science 220:795-798.Narita, N., Nishio, H., Kitoh, Y., Ishikawa, Y., Ishikawa, Y., Minami, R., Nakamura, H. andMatsuo, M. 1993. Insertion of a 5’ truncated Li element into the 3’ end of exon 44 of thedystrophin gene resulted in skipping of the exon during splicing in a case of Duchenne musculardystrophy. J. Clin. Invest. 91: 1862-1867.O’Brien, S., Bonner, T., Cohen, M., O’Connell, C. and Nash, W. 1983. Mapping of an endogenousretroviral sequence to human chromosome 18. Nature 303: 74-77.O’Connell, C., O’Brien, S., Nash, W. and Cohen, N. 1984. ERV3, a full-length human endogenousprovirus: chromosomal localization and evolutionary relationship. Virology 138: 225-235.Ohshima, K., Koishi, R., Matsuo, M. and Okada, N. 1993. Several short interspersed repetitiveelements (SINEs) in distant species may have originated from a common ancestral retrovirus:characterization of a squid SINE and a possible mechanism for generation of tRNA-derivedretroposons. Proc Natl. Acad. Sci. USA 90: 6260-6264.Onno, M., Nakamura, T., Hillova, J. and Hill, M. 1992. Rearrangement of the human tre oncogeneby homologous recombination between .Alu repeats of nucleotide sequences from two differentchromosomes. Oncogene 7: 2519-2523.Ono, M. 1986. Molecular cloning and long terminal repeat sequences of human endogenousretrovirus genes related to types A and B retrovirus genes. J. Virol. 58: 937-944.Ono, M. and Ohishi, H. 1983. Long terminal repeat seuences of intracisternal A particle genes inthe Syrian hamster genome: identification0ftRNArhe as a putative primer tRNA. Nucl. AcidsRes. 11: 7169-7179.Ono, M., Yasunaga, T., Miyata, T. and Ushikubo, H. 1986. Nucleotide sequence of humanendogenous retrovirus genome related to the mouse mammary tumor virus genome. J. Virol. 60:589-598.Ono, M., Kawakami, M. and Takezawa, T. 1987a. A novel human nonviral retroposon derived froman endogenous retrovirus. Nucl. Acids Res. 15: 8725-8737.Ono, M., Kawakami, M. and Ushikubo, H. 1987b. Stimulation of expression of the humanendogenous retrovirus genome by female steroid hormones in human breast cancer cell lineT47D. J. Virol. 61: 2059-2062.Orgel, L.E., and Crick, F.H. 1980. Selfish DNA: the ultimate parasite. Nature. 284: 604-607.Paquin, C.E. and Williamson, V.M. 1984. Temperature effects on the rate of Ty transposition.Science 226: 53-55.Pardue, M.L. 1991. Dynamic instabiltiy of chromosomes and genomes. Cell 66: 427-431.204Parkhurst, S.M. and Corces, V.G. 1985. forked, gypsys, and suppressors in Drosophila. Cell 41: 429-437.Parkhurst, S.M. and Corces, V.G, 1986. Mutations at the suppressor of forked locus increase theaccumulation of gypsy-encoded transcripts in Drosophila melanogaster. Mol. Cell, Biol. 6: 2271-2274.Paulson, K., Deka, N., Schmid, C., Misra, R., Schindler, C., Rush, M., Kadyk, L. and Leinwand, L.1985. A transposon-like element in human DNA. Nature 316: 359-361.Paulson, K.E., Matera, A.G., Deka, N. and Schmid, C.W. 1987. Transcription of a humantransposon-like sequence is usually directed by other promoters. Nucl. Acids Res. 15: 5199-5215.Paulson, K.E. and Schmid, C.W. 1986. Transcriptional inactivity of Alu repeats in HeLa cells. Nucl,Acids Res. 14: 6145-6158.Pelisson, A., Finnegan, D.J. and Bucheton, A. 1991. Evidence for retrotransposition of the I factor, aLINE element of Drosophila melanogaster. Proc. Nati. Acad. Sci. USA 88: 4907-4910.Perl, A., Rosenblatt, J., Chen, I., DiVincenzo, J., Bever, R., Poiesz, J. and Abraham, G. 1989.Detection and cloning of new HTLV-related endogenous sequences in man. Nuci. Acids Res. 17:6841-6854.Pen, A., Isaacs, C., Eddy, R., Byers, M., Sait, S. and Shows, T. 1991. The human P-cell leukemiavirus-related endogenous sequence (HRES1) is located on chromosome 1 at q42. Genomics 11:1172-1173.Peronnet, F., Becker, J.L., Becker, J., DAuniol, L., Galibert, F. and Best-Belpomme, M. 1986. 1731,a new retrotransposon with hormone modulated expression. Nucl. Acids Res. 14: 9017-9033.Pizzuti, A., Pieretti, M., Fenwick, R., Gibbs, R. and Caskey, C. 1992. A transposon-like element inthe deletion-prone region of the dystrophin gene. Genomics 13: 594-600.Pryciak, P.M. and Varmus, H.E. 1992. Nucleosomes, DNA-binding proteins, and DNA sequencemodulate retroviral integration target site selection. Cell 69: 769-780.Quentin, Y. 1988. The Mu family developed through successive waves of fixation closely connectedwith primate lineage history. J. Mol. Evol. 27: 194-202.Quentin, Y. 1992a. Fusion of a free left Mu monomer and a free right Mu monomer at the origin ofthe Mu family in the primate genomes. Nucl. Acids Res. 20: 487-493.Quentin, Y. 1992b. Origin of the Mu family: a family of Mu-like monomers gave birth to the left andthe right arms of the Mu elements. Nucl. Acids Res. 20: 3397-3401.Query, C.C. and Keene, J.D. 1987. A human autoimmune protein associated with Ui RNA containsa region of homology that is cross-reactive with retroviral p30 gag antigen. Cell 51: 211-220.Rabson, A., Steele, P., Garon, C. and Martin, M. 1983. mRNA transcripts related to full-lengthendogenous retroviral DNA in human cells. Nature 306: 604-607.Rabson, A., Hamagishi, Y., Steele, P., Tykocinske, M. and Martin, M. 1985. Characterization ofhuman endogenous retroviral envelope RNA transcripts. J. Virol. 56: 176-182.Reed, K.C. and Mann, D.A. 1985. Rapid transfer of DNA from agarose gels to nylon membranes.Nuci. Acids. Res. 13: 7207-7221.205Reeves, R.H. and O’Brien, S.J. 1984. Molecular genetic characterization of the RD-114 gene familyof endogenous feline retroviral sequences. J. Virol. 52: 164-171.Renan, M.J. and Reeves, B.R. 1987. Chromosomal localization of human endogenous retroviralelement ERV1 to 18q22-q23 by in situ hybridization. Cytogenet. Cell Genet, 44: 167-170.Repaske, R., O’Neill, R., Steele, P. and Martin, M. 1983. Characterization and partial nucleotidesequence of endogenous type C retrovirus segments in human chromosomal DNA. Proc. Nati.Acad. Sci. USA 80: 678-682.Repaske, R., Steele, P., O’Neill, R., Rabson, A. and Martin, M. 1985. Nucleotide sequence of a fulllength human endogenous retroviral segment. J. Virol. 54: 764-772.Reuss, F.U. and Schaller, H.C. 1991. cDNA sequence and genornic characterization ofintracisternal-A-particle-related retroviral elements containing an envelope gene. J. Virol. 65:5702-5709.Ricke, D.O., Ketterling, R.P. and Sommer, S.S. 1992. PRE: a novel element with the hallmarks of aretrotransposon derived from an unknown structural RNA. Nuci. Acids Res. 20: 5233.Rohdewohld, H., Weiher, H., Reik, W., Jaenisch, R. and Breindi, M. 1987. Retrovirus integrationand chromatin structure: Moloney murine leukemia virus gene expression. Proc. Natl. Acad. Sci.USA 84: 49 19-4923.Rolfe, M., Spanos, A. and Banks, 0. 1986. Induction of yeast Ty element transcription by ultravioletlight. Nature 319: 339-340.Rouyer, F., Simmler, M.-C., Page, D.C. and Weissenbach, J. 1987. A sex chromosomerearrangement in a human XX male caused by Alu-Alu recombination. Cell 51: 417-425.Sakoyama, Y., Hong, K.-J., Byun, S.M., Hisajima, H., Ueda, S., Yaoita, Y., Hayashida, H., Miyata,T. and Honjo, T. 1987. Nucleotide sequences of immunoglobin genes of chimpanzee andorangutan: DNA molecular clock and hominoid evolution. Proc. Nati. Acad. Sci. USA 84: 1080-1084.Sambrook, J., Fritsch, E.F. and Maniatis, T. 1989. Molecular cloning. A laboratory manual, 2nd ed,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.Samuelson, L., Wiebauer, K., Snow, C. and Meisler, M. 1990. Retroviral and pseudogene insertionsites reveal the lineage of human salivary and pancreatic amylase genes from a single geneduring primate evolution. Mol. Cell. Biol. 10: 2513-2520.Scherdin, U., Rhodes, K. and Breindi, M. 1990. Transcriptionally active genome regions arepreferred targets for retrovirus integration. J. Virol. 64: 907-912.Schiff, R., Itin, A. and Keshet, E. 1991. Transcriptional activation of mouse retrotransposons invivo: specific expression in steroidogeneic cells in response to trophic hormones. Genes Dev. 5:521-532.Schmid, C.W. and Jelinek, W.R. 1982. The Mu family of dispersed repetitive sequences. Science216: 1065-1070.Schmid, C. and Maraia, R. 1992. Transcriptional regulation and transpositional selection of activeSINE sequences. Curr. Opin. Genet. Devel. 2: 874-882.206Schmid, C., Wong, E. and Deka, N. 1990. Single copy sequences in galago DNA resemble arepetitive human retrotransposon-like family. J. Mol. Evol. 31: 92-100.Schwartz, D.A., Dahm, M.W., Bai, L., Carnie, S. and Norris, J.S. 1993. Construction of aretrotransposition indicator sequence using a neomycin resistance-encoding gene containing afunctional intron. Gene 127: 233-236.Schwarz-Sommer, Z., Leclercq, L., Gobel, E. and Saedler, H. 1987. Cin4, an insert altering thestructure of the Al gene in Zea mays, exhibits properties of nonviral retrotransposons. EMBO J.6: 3873-3880.Scott, A.F., Schmeckpeper, B.J., Abdeirazik, M., Comey, C.T., OHara, B., Rossiter, J.P., Cooley, T.,Heath, P., Smith, K.D. and Margolet, L. 1987. Origin of the human Ll elements: proposedprogenitor genes deduced from a consensus DNA sequence. Genomics 1: 113-125.Segal-Bendirdjian, E. and Heidmann, T. 1991. Evidence for a reverse transcription intermediate fora marked LINE transposon in tumoral rat cells. Biochem. Biophys. Res. Commun. 181: 863-870.Servomaa, K. and Rytomaa, T. 1988. Suicidal death of rat chioroleukemia cells by activation of thelong interspersed repetitive DNA element (L1Rn). Cell and Tissue Kinetics 21: 33-43.Servomaa, K. and Rytomaa, T. 1990. UV light and ionizing radiations cause programmed death ofrat chloroleukaemia cells by inducing retropositions of a mobile DNA element (L1Rn). Tnt. J.Radiat. Biol. 57: 331-343.Shen, M.R., Batzer, M.A. and Deininger, P.L. 1991. Evolution of the master Alu gene(s). J. Mol.Evol. 33: 311-320.Shiba, T. and Saigo, K. 1983. Retrovirus-like particles containing RNA homologous to thetransposable element copia in Drosophila melanogaster. Nature 302: 119-124.Shih, A., Coutavas, E.E. and Rush, M.G. 1991. Evolutionary implications of primate endogenousretrovirus. Virology 182: 495-502.Shih, C.-C., Stoye, J.P. and Coffin, J.M. 1988. Highly preferred targets for retrovirus intgration.Cell 53: 531-537.Shimotohno, K., Takahashi, Y., Shimizu, N., Gojobori, T., Golde, D., Chen, I., Miwa, M. andSugimura, T. 1985. Complete nucleotide sequence of an infectious clone of human T-cellleukemia virus type II: an open reading frame for the protease gene. Proc. Nati. Acad. Sci. USA82: 3101-3105.Shirmick, T.M., Lerner, R.A. and Sutcliffe, J.G. 1981. Nucleotide sequence of Moloney murineleukemia virus. Nature 293: 543-548.Shippen-Lentz, D. and Blackburn, E.H.. 1990. Functional evidence for an RNA template intelomerase. Science 247: 546-552.Sibley, C.G. and Ahiquist, J.E. 1987. DNA hybridization evidence of hominoid phylogeny: resultsfrom an expanded data set. J. Mol. Evol. 26: 99-121.Silver, J., Rabson, A., Bryan, T., Willey, R. and Martin, M. 1987. Human retroviral sequences onthe Y chromosome. Mol. Cell. Biol. 7: 1559-1562.Singer, M. 1982. SINEs and LINEs: highly repeated short and long interspersed sequences inmammalian genomes. Cell 28: 433-434.207Singer, M.F., Skowronski, J., Fanning, T.G. and Mongkolsuk, S. The functional potential of thehuman LINE-i family of interspersed repeats. hi: Lambert, M.E., McDonald, J.F. andWeinstein, I.B. (Eds.), Eukaryotic Transposable Elements as Mutagenic Agents. Cold SpringHarbor Laboratory, Cold Spring Harbor, NY, 1988, pp. 71-72.Sinnett, D., Richer, C., Deragon, J-M. and Labuda, D. 1991. Mu RNA secondary structure consistsof two independent 7SL RNA-like folding units. J. Biol. Chem. 266: 8675-8678.Sinnett, D., Richer, C., Deragon, J-M. and Labuda, D. 1992. Mu RNA transcripts in humanembryonal carcinoma cells: model of post-transcriptional selection of master sequences. J. Mol.Biol. 226: 689-706.Skowronski, J., Fanning, T.G. and Singer, M.F, 1988. Unit-length LINE-i transcripts in humanteratocarcinoma cells. Mol. Cell. Biol. 8: 1385-1397.Skowronski, J. and Singer, M.F. 1985. Expression of a cytoplasmic LINE-i transcript is regulated ina human teratocarcinoma cell line. Proc. Natl. Acad. Sci. USA 82: 6050-6054.Skowronski, J. and Singer, M.F. 1986. The abundant LINE-i family of repeated DNA sequences inmammals: genes and pseudogenes. Cold Spring Harbor Symp. Quant. Biol. 51:457-463.Slightom, J.L., Blechl, A.E. and Smithies, 0. 1980. Human fetal G and Ayglobin genes: completenucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell21: 627-638.Smit, A.F.A. 1993. Identification of a new, abundant superfamily of mammalian LTR-transposons.Nucl. Acids Res. 21: 1863-1872.Snyderman, R. and Cianciolo, G.J. 1984. Immunosuppressive activity of the retroviral envelopeprotein P15E and its possible relationship to neoplasia. Immunol. Today 5: 240-244.Soares, M.B., Schon, E., Henderson, A., Karathanasis, S.K., Sate, R., Zeitlin, S., Chirgwin, J. andEfstratiadis, S. 1985. RNA-mediated gene duplication: the rat preproinsulin I gene is afunctional retroposon. Mol. Cell. Biol. 5: 2090-2103.Stavenhagen, J.B. and Robins, D.M. 1988. An ancient provirus has imposed androgen regulation onthe adjacent mouse sex-limited protein gene. Cell 55: 247-254.Steele, P., Rabson, A., Bryan, T. and Martin, M. 1984. Distinctive termini charcterize two familiesof human endogenous retroviral sequences. Science 225: 943-947.Steele, P., Martin, M., Rabson, A., Bryan, T. and O’Brien, S. 1986. Amplification and chromosomaldispersion of human endogenous retroviral sequences. J. Virol. 59: 545-550.Stout, J.T. and Caskey, C.T. 1985. HPRT: Gene structure, expression, and mutation. Ann. Rev.Genet. 19: 127-148.Stoye, J.P., Fenner, S., Greenoak, G.E., Moran, C. and Coffin, J.M. 1988. Role of endogenousretroviruses as mutagens: the hairless mutation of mice. Cell 54: 383-391.Strand, D. and McDonald, J. 1985. Copia is transcriptionally responsive to environmental stress.Nucl. Acids Res. 13: 4401-4410.208Streydio, C., Swillens, S., Georges, M., Szpirer, C. and Vassart, G. 1990. Structure, evolution andchromosomal localization of the human pregnancy-specific 31 glycoprotein gene family.Genomics 6: 579-592.Sun, L., Paulson, K.E., Schmid, C.W., Kadyk, L. and Leinwand, L. 1984. Non-Alu familyinterpersed repeats in human DNA and their transcriptional activity. Nucl. Acids Res. 12: 2669-2690.Suzuki, H., Hoskawa, Y., Toda, H., Nishikimi, M. and Ozawa, T. 1990. Common protein-bindingsites in the 5 ‘-flanking regions of human genes for cytochrome C1 and ubiquinone-bindingprotein. J. Biol. Chem. 265: 8159-8163.Swergold, G.D. 1990. Identification, characterization, and cell specificity of a human LINE-ipromoter. Mol. Cell. Biol. 10: 6718-6729.Tabor, S. and Richardson, C.C. 1987. DNA sequence analysis with a modified bacterophage T7 DNApolymerase. Proc. Natl. Acaci. Sci. USA. 84: 4767-4771.Taruscio, D. and Manuelidis, L. 1991. Integration site preferences of endogenous retroviruses.Chromosoma 101: 141-156.Tchenio, T. and Heidmann. T. 1991. Defective retroviruses can disperse in the human genome byintracellular transpostion. J. Virol. 65: 2113-2118.Tchenio, T. and Heidmann, T. 1992. High frequency intracellular transposition of a defectivemammalian provirus detected by an in situ colorimetric assay. J. Virol. 66: 1571-1578.Tchenio, T., Segal-Bendirdjian, E. and Heidmann, T. 1993. Generation of processed pseudogenes inmurine cells. EMBO J. 12: 1487-1497.Temin, H. 1980. Origin of retroviruses from cellular moveable genetic elements. Cell 21: 599-600.Temin, H.M. 1982. Function of the retrovirus long terminal repeat. Cell 28: 3-5.Thomas, K.R. and Capecchi, M.R. 1987. Site-directed mutagenesis by gene targeting in mouseembryo-derived stem cells. Cell. 51: 503-512.Ting, C.-N., Rosenberg, M.P., Snow, C.M., Samuelson, L.C. and Meisler, M.H. 1992. Endogenousretroviral sequences are required for tissue-specific expression of a human salivary amylasegene. Genes. Devel. 6: 1457-1465.Tomita, N., Horii, A., Doi, S., Yokouchi, H., Ogawa, M., Mori, T. and Matsubara, K. 1990.Transcription of human endogenous retroviral long terminal repeat (LTR) sequence in a lungcancer cell line. Bioch. Biophys. Res. Comm. 166: 1-10.Trauger, R.J., Talbott, R., Wilson, S.H., Karpel,. R.L. and Elder, J.H. 1990. A single-stranded nucleicacid binding sequence common to the heterogenous nuclear ribonucleoparticle protein Al andmurine recombinant virus GP7O. J. Biol. Chem. 265: 3674-3678.Ullu, E. and Tschudi, C. 1984. Alu sequences are processed 7SL RNA genes. Nature 312: 171-172.Uflu, E. and Weiner, A.M. 1985. Upstream sequences modulate the internal promoter of the human7SL RNA gene. Nature 318: 371-374.Varmus, H. and Brown, P. 1989. Retroviruses. hi: Mobile DNA, Berg, D.E. and Howe, M.M. (Eds).American Society for Microbiology, Washington, D.C., pp 53-108.209Vidaud, D., Vidaud, M., Bahnak, B.R., Siguret, V., Sanchez, S.G., Laurian, Y., Meyer, D., Goossens,M. and Lavergne, J.M. 1993. Haemophilia B due to a de novo insertion of a human-specific Alusubfamily member within the coding region of the factor IX gene. Eur. J. Hum. Genet. 1: 30-36.Vijaya, S., Steffen, D.L. and Robinson, Hi. 1986. Acceptor sites for retroviral integrations mapnear DNase I-hypersensitive sites in chromatin. J. Virol. 60: 683-692.Vogeistein, B., Fearon, E.R., Hamilton, S.R., Kern, S.E., Preisinger, A.C., Leppert, M., Nakamura,Y., White, R., Smits, A.M.M. and Bos, J.L. 1988. Genetic alteration during colorectal-tumourdevelopment. New Eng. J. Med. 319: 525-532.von Sternberg, R.M., Novick, G.E., Gao, G.-P. and Herrera, R.J. 1992. Genome canalization: thecoevolution of transposable and interspersed repetitive elements with single copy DNA. Genetica86: 215-246.Wallace, M.R., Anderson, L.B., Saulino, A.M.,Gregory, P.E., Glover, T.W. and Collins, F.S. 1991. Ade novo Alu insertion results in neurofibromatosis type 1. Nature 353: 864-866.Weiner, A., Deininger, P. and Efstratiadis, A. 1986. Nonviral retroposons: genes, pseudogenes, andtransposable elements generated by the reverse flow of genetic information, Ann. Rev. Biochem.55: 631-661.Werner, T., Brack-Werner, R., Leib-Mosch, C., Backhaus, H., Erfie, V. and Hehlmann, R. 1990. S71is a phylogenetically distinct human endogenous retroviral element with structural andsequence homology to simian sarcoma virus (SSV). Virology 174: 225-238.Westley, B. and May, F. 1984. The human genome contains multiple sequences of varying homologyto mouse mammary tumor virus DNA. Gene 28: 22 1-227.Whitcomb, J.M. and Hughes, S.H. 1992. Retroviral reverse transcription and integration: progressand problems. Annu. Rev. Cell Biol. 8: 275-306.Wilke, C.M. and Adams, J. 1992. Fitness effects of Ty transposition in Saccharomyces cerevisiae.Genetics 131: 31-42.Wilkinson, D.A. 1993. Expression of the RT\\TL-H family of human endogenous retrovirus-likesequences. Ph.D. dissertation. University of British Columbia, Vancouver, B.C., Canada.Wilkinson, D.A., Freeman, J.D., Goodchild, N.L., Kelleher, C.A. and Mager, D.L. 1990.Autonomous expression of RTVL-H endogenous retrovirus like elements in human cells. J. Virol.67: 2157-2167.Wilkinson, D.A., Goodchild, N.L., Saxton, T.M., Wood, S. and Mager, D.L. 1993. Evidence for afunctional subclass of the RTVL-H family of human endogenous retrovirus-like sequences. J.Virol. 67: 2981-2989.Wilkinson, D.A. and Mager, D.L. 1993. RTVL-H human endogenous retrovirus-like elements withan intact poi and an env-related region. J. Cancer Res. Clin. Oncol. 199 (Suppl 1): S8.Wilkinson, D.A., Mager, D.L. and Leong, J.C. Endogenous human retroviruses. jii: TheRetroviridae, Volume 3. Levy, J. (Ed). Plenum Press, New York. (in press).Williams, K.J. and Loeb, L.A. 1992. Retroviral reverse transcriptases: error frequencies andmutagenesis. Curr. Top. Microbiol. Immunol. 176: 165-180.210Wilson, M.C., Policastro, P.F. and Fredholm, M. 1988. Regulation of expression and transposition ofmurine endogenous retroviral elements. Banbury reports 30: Transposable elements asmutagenic agents. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. pp. 131-144.Xiong, Y. and Eickbush, T. 1990. Origin and evolution of retroelements based upon their reversetranscriptase sequences. EMBO J. 9: 3353-3362.Xii, H. and Boeke, J.D. 1990. Localization of sequences required in cis for yeast Tyl elementtransposition near the long terminal repeats: analysis of mini-Tyl elements. Mol. Cell. Biol. 10:2695-2702.Yeh, K.-W., Yen, C.-P., Liu, J.-C., Feng, Y.-N., Wu, F.Y.-H., Yang, W.-K. and Wu, C.-W. 1991.Isolation of a cDNA clone of human endogenous retrovirus (HERV) from human cancer cell line.FASEB J. 5: A884.Ymer, S., Tucker, W.Q.J., Sanderson, C.J., Hapel, A.J., Campbell, H.D. and Young, I.G. 1985.Constitutive synthesis of interleukin-3 by leukaemia cell line WEHI-3B is due to retroviralinsertion near the gene. Nature 317: 255-258.Yoshioka, K., Honma, H., Zushi, M., Kondo, S., Togashi, S., Miyake, T. and Shiba, T. 1990, Virus-like particle formation of Drosophila copia through autocatalytic processing. EMBO J. 9: 535-541.Yoshioka, K., Kanda, H., Akiba, H., Enoki, M. and Shiba, T. 1991. Identification of an unusualstructure in the Drosophila melanogaster transposable element copia: evidence for copiatransposition through an RNA intermediate. Gene 103: 179-184.Zucchi, I. and Schiessinger, D. 1992. Distribution of moderately repetitive sequences pTR5 and LF1in Xq24-q28 human DNA and their use in assembling YAC contigs. Genomics 12: 264.211"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "1994-05"@en ; edm:isShownAt "10.14288/1.0088201"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Genetics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "The impact of endogenous retrovirus-like sequences on the human genome"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/6917"@en .