Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Transcriptional regulation of human genes by endogenous retroviral elements Landry, Josette-Renee 2003

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2003-854574.pdf [ 8.19MB ]
Metadata
JSON: 831-1.0099727.json
JSON-LD: 831-1.0099727-ld.json
RDF/XML (Pretty): 831-1.0099727-rdf.xml
RDF/JSON: 831-1.0099727-rdf.json
Turtle: 831-1.0099727-turtle.txt
N-Triples: 831-1.0099727-rdf-ntriples.txt
Original Record: 831-1.0099727-source.json
Full Text
831-1.0099727-fulltext.txt
Citation
831-1.0099727.ris

Full Text

TRANSCRIPTIONAL R E G U L A T I O N OF H U M A N GENES B Y ENDOGENOUS RETROVIRAL E L E M E N T S  by Josette-Renee Landry B.Sc. (Honours), McGill University, 1998  A THESIS SUBMITTED IN PARTIAL F U L F I L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Medical Genetics, Genetics Graduate Programme) We accept this thesis as conforming to the required standard  T H E "UNIVERSITY OF BRITISH C O L U M B I A April 2003 © Josette-Renee Landry, 2003  In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t t h e U n i v e r s i t y o f B r i t i s h Columbia, I agree t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by t h e head o f my department o r by h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n .  Department o f  f/VK'JMS  (bptlaCxt  The U n i v e r s i t y o f B r i t i s h Columbia Vancouver, Canada Date  JB S(CcJ  'tfcr  t4  Abstract Human endogenous retroviruses (HERVs) and other long terminal repeat (LTR)containing elements comprise a significant portion (8%) of the human genome and are likely vestiges of retroviral infections during primate evolution. Although the vast majority of HERVs are now unable to code for retroviral proteins, an unknown number have retained functional transcriptional elements within their LTRs and some of these regulatory sequences have been shown to participate in the transcription of nearby genes. The overall objective of my thesis was to further understand the role of HERVs in human gene regulation by investigating LTRs that provide alternative promoters to cellular genes. When I began my study, three putative endogenous retroviral promoters were identified by screening sequence databases for chimeric (viral-cellular) transcripts. These searches revealed fusion transcripts containing the LTRs of three H E R V - E elements linked to the endothelin B receptor  (EDNRB), the apolipoprotein C - l (APOC1) and the Opitz syndrome gene, midline 1. To confirm the authenticity of the chimeric transcript and to establish that the mRNAs were transcribed from the retroviral LTRs, we performed 5'RACE and determined the genomic organization for each gene. Our results indicated that the chimeric transcripts were alternatively promoted by the retroviral elements, as they initiated within H E R V - E LTRs but spliced into the downstream coding sequence of the cellular genes. To determine the expression pattern and the relative contribution of the retroviral promoter, we quantified the percentage of transcripts which were chimeric in various tissues using real-time PCR. While chimeric APOC1 transcripts could be detected in several tissues tested, the retroviral promoter of  EDNRB and MIDI appeared to be placenta-specific. Transient transfection  studies supported a role for the  EDNRB and MIDI LTRs as strong promoters in placenta and  ii  suggested a function for the LTRs as enhancers. Further deletion and hybrid constructs delineated regions within both LTRs necessary for strong promoter activity. Finally, to further characterize the APOC1,  EDNRB and MIDI genes, the non-retro viral (native)  promoters of these three genes were also analysed. These findings provide further evidence that some endogenous retroviruses have evolved a biological function as transcriptional regulatory elements by contributing alternative promoters to human genes.  in  Table of contents Abstract  ii  List of Tables  ix  List of Figures  x  List of Abbreviations  xii  Acknowledgements  xvi  Chapter 1: Introduction  1  1.1 Human Genome  2  1.2 Repetitive Elements  3  1.3 Short Interspersed Nuclear Elements (SINEs)  5  1.4 Long Interspersed Nuclear Elements (LINEs)  7  1.5 Human Endogenous Retroviruses (HERVs)  9  1.5.1 Origin and structure  9  Origin  9  Attributes of exogenous retroviruses  9  Characteristics of endogenous retroviruses  12  1.5.2 Classification and diversity  14  Nomenclature and diversity  14  Abundance and distribution  15  Age and polymorphism  16  1.5.3 Activity and expression  19  R N A expression and promoter activity  19  Transcription regulation  20  iv  Protein expression  23  1.5.4 Biological consequences  25  Pathological implications  25  Genome evolution  27  Coding regions  30  Gene regulation  31  1.6 Thesis Objectives and Organization  37  Chapter 2: Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-l genes in humans  40  2.1 Introduction  41  2.2 Materials and Methods  42  2.2.1 Rapid amplification of cDNA ends  42  2.2.2 Reverse Transcription and PCR Amplification  42  2.2.3 Plasmid Constructs  43  2.2.4 Cell Lines and Transient Transfections  45  2.2.5 Locus-specific PCR  46  2.3 Results  48  2.3.1 Identification and characterization of chimeric transcripts  48  2.3.2 Genomic structure and transcript forms  50  2.3.3 Evolutionary age of the LTRs  56  2.3.4 Significance of the H E R V - E LTRs in expression of APOC1 and EDNRB ... 57 2.4 Discussion  64  v  Chapter 3: Functional analysis ofthe endogenous retroviral promoter ofthe human endothelin B receptor gene  68  3.1 Introduction  69  3.2 Materials and Methods.  70  3.2.1 Reverse transcription and Real-time PCR  70  3.2.2 Sequence analysis  71  3.2.3 Plasmid constructions...  71  3.2.4 Cell culture and transient transfections  73  3.3 Results  75  3.3.1 Expression pattern and contribution of APOC1 and EDNRB chimeric transcripts  75  3.3.2 Transcriptional activity of the retroviral promoters  77  3.3.3 Mapping ofthe EDNRB LTR promoter  81  3.3.4 Hybrid construct experiments  85  3.3.5 Site-directed mutagenesis ofthe EDNRB L T R  87  3.4 Discussion  90  vi  Chapter 4: The Opitz syndrome gene MIDI is transcribed from a human endogenous retroviral promoter  94  4.1 Introduction  95  4.2 Materials and Methods  96  4.2.1 Database searches and sequence analysis  96  4.2.2 Rapid amplification of cDNA ends  96  4.2.3 Reverse transcription and Real-time PCR  97  4.2.4 Plasmid constructions  97  4.2.5 Cell culture and transient transfections  98  4.2.6 Genomic typing for the  MIDI H E R V - E L T R  4.3 Results  99 101  4.3.1 Identification and characterization of the  MIDI chimeric transcript  101  4.3.2 Tissue specificity of the chimeric MIDI transcript  103  4.3.3 Functional analysis of the retroviral promoter  105  4.3.4 Deletion study of the retroviral promoter  107  4.3.5 Analysis of the integration time and potential impact of the retroviral element  109  4.4 Discussion  112  Chapter 5: Alternative promoters, conserved between human and rodents, control the expression of MIDI  116  5.1 Introduction  117  5.2 Materials and Methods  118  5.2.1 Database searches and sequence analysis  vii  118  5.2.2 Rapid amplification of cDNA ends  118  5.2.3 Dot blot hybridizations  119  5.2.4 Real-time PCR  120  5.2.5 Plasmid constructions  120  5.2.6 Cell culture and transfection conditions  121  5.2.7 Isolation ofthe IC and IE genomic regions in other species  121  5.3 Results  ,  124  5.3.1 Characterization of the genomic structure ofthe alternative first exons  of MIDI  124  5.3.2 Identification of the 5' ends of human MIDI isoforms 5.3.3 Tissue distribution of alternative  MIDI 5' UTRs in human  126 128  5.3.4 Expression pattern of total MIDI transcripts in human tissues  130  5.3.5 Sequence analysis of the regions upstream ofthe alternative 5' UTRs  132  5.3.6 Promoter activity of the 5' flanking sequence of exons IC, ID and IE  133  5.3.7 Isolation and comparison of variant MIDI first exons in other mammals.... 135 5.3.8 Comparison of  MIDI promoter regions C and E between species  5.4 Discussion  139 141  Chapter 6: Conclusion and General Discussion  144  Bibliography  149  Appendix: Repetitive elements in the 5' untranslated region of a human zinc finger gene modulate transcription and translation efficiency  viii  178  List of Tables Table 1.1 Characteristics of some H E R V families  18  Table 1.2 Putative biological effects of HERVs and other endogenous retroviruses  29  Table 1.3 HERVs involved in the transcriptional regulation of human genes  33  Table 2.1 Primers used for APOC1 and EDNRB genomic and cDNA amplifications  47  Table 3.1 Oligonucleotides used for APOC1 and EDNRB constructs and real-time study. 74 Table 4.1 Primers used for MIDI genomic and cDNA amplifications  100  Table 5.1 Oligonucleotides used for MIDI 5' R A C E and other procedures  123  ix  List of Figures Figure 1.1 Classes of transposable elements in the human genome  4  Figure 1.2 Genetic organization of simple retrovirus  11  Figure 1.3 Structure of endogenous retroviral elements and related sequences  13  Figure 1.4 Transcriptional control elements of LTRs  22  Figure 1.5 LTR involvement in cellular gene regulation  32  Figure 2.1 Detection  oiAPOCl and EDNRB chimeric transcripts in different tissues.... 49  Figure 2.2 Schematic representation of the human APOC1 gene transcript isoforms  52  Figure 2.3 Organization of the alternative 5' UTRs of the human EDNRB gene  54  Figure 2.4 Multiple sequence alignment of APOC1 LTR of different primates  57  Figure 2.5 Multiple sequence alignment of EDNRB LTR in various primates  58  Figure 2.6 Effect of the L T R on APOC1 promoter activity in human and baboon  60  Figure 2.7 Promoter and enhancer activity of the EDNRB L T R  63  Figure 3.1 Proportion of APOC1 and EDNRB transcripts contributed by the LTRs  76  Figure 3.2 Promoter activity of the APOC1 and EDNRB LTRs  78  Figure 3.3 Sequence comparison of the retroviral promoters of the EDNRB and  APOC1 genes  80  Figure 3.4 Effects of 5' deletions on the transcriptional activity of the EDNRB LTR. ...  83  Figure 3.5 Confirmation of a c/s-element between position 111-122 of the EDNRB retroviral promoter  84  Figure 3.6 Fusion study of the APOC1 and EDNRB LTRs  86  Figure 3.7 Mutational analysis of position 176 to 215 of the EDNRB retroviral promoter 89 Figure 4.1 Characterization of human retroviral-MZDi chimeric transcripts  x  102  Figure 4.2 Comparison of overall MIDI and chimeric MIDI expression levels  104  Figure 4.3 Promoter and enhancer activity of the retroviral L T R  106  Figure 4.4 Deletion analysis of the retroviral L T R promoter  108  Figure 4.5 Evolutionary analysis of the H E R V - E element  Ill  Figure 5.1 Genomic organization of alternative Figure 5.2  Nucleotide sequence of the heterogeneous human MIDI 5' UTRs  Figure 5.3 Tissue-specificity of Figure 5.4  MIDI first exons  MIDI exons la, lb, lc and le  MIDI overall expression level as measured by real-time PCR  Figure 5.5 Functional analysis of alternative  MIDI promoters  Figure 5.6 Alternative Midi transcripts in other species Figure 5.7 Sequence alignment of the human, rodent, and porcine first exons of Figure 5.8  Multiple sequence alignment of  MIDI promoter regions C and E  xi  125 127 129 131 134 136  MIDI... 138 140  Abbreviations aa  amino acid  ADH1C  alcohol dehydrogenase IC  ALV  avian leukemia virus  AMY  amylase  APOC1  apolipoprotein CI  AZFa  azoospermia factor a  Bp  base pair  BRCA1  breast cancer 1  CA  capsid  cDNA  complementary D N A  cORF  central open reading frame  CD8  clusters of differentiation 8  DMEM  Dulbecco's modified eagle's medium  DNA  deoxyribonucleic acid  dUTPase  deoxyuridine triphosphatase  EDNRB  endothelin B receptor  Env  envelope  ERV  endogenous retrovirus  EST  expressed sequence tag  ETn  early transposon  FeLV  feline leukemia virus  Gag  group antigen  xii  GAPDH  glyceraldehyde-3-phosphate dehydrogenase  GHR  growth hormone receptor  HERV  human endogenous retrovirus  HCR  hepatic control region  HIV  human immunodeficiency virus  HHLA  H E R V - H LTR-associating  HML  human endogenous MMTV-like  HTLV  human T-cell leukemia virus  IAP  intracisternal A type particle  IDDM  insulin-dependent diabetes mellitus  IN  integrase  Kb  kilobase  LCR  locus control region  LINE  long interspersed nuclear element  LTR  long terminal repeat  MA  matrix  MaLR  mammalian apparent L T R retrotransposon  Mb  megabase  MHC  major histocompatibility complex  MIDI  midline 1  MIR  mammalian-wide interspersed repeats  MLV  murine leukemia virus  MMTV  mouse mammary tumour virus  xiii  mRNA  messager R N A  MS  multiple sclerosis  MYr  million year  NC  nucleocapsid  NWM  new world monkey  OBR  leptin receptor  OC90  otoconin-90  ORF  open reading frame  OS  Opitz syndrome  OWM  old world monkey  PBS  primer binding site  PCR  polymerase chain reaction  PLA2L  phospholipase A2 like  PLK  provirus-linked Kriippel related  PLT  placental L T R terminated  PLZF  promyelocyte leukemia zinc finger protein  Pol  polymerase  Pro  protease  PTN  pleiotropin  R  repeat  RACE  rapid amplification of cDNA ends  Rev/Rex  regulator of expression of viral proteins  RNA  ribonucleic acid  RPMI  Roswell Park Memorial Institute  RT  reverse transcriptase  RT-PCR  reverse transcriptase-polymerase chain reaction  SA  splice acceptor  Sag  superantigen  SD  splice donor  SDS  sodium dodecyl sulphate  SINE  short interspersed nuclear element  SSC  standard sodium citrate  SU  surface  Tat  trans-activator  TM  transmembrane  tRNA  transfer R N A  UTR  untranslated region  U3  unique 3'  U5  unique 5'  VEGF  vascular endothelial growth factor  ZNF  zinc finger  XV  Acknowledgements I would first like to thank my parents who supported me both morally and financially throughout my university years. Their encouragement from early on played a primordial role in my academic achievements and my decision to pursue a career in science. I would then like to thank Dixie for believing in my potential as a researcher and giving me the chance and opportunity to conduct interesting independent research. Her great supervision and foresight (in accepting another graduate student in 1998) made the last 5 years of my life quite enriching.  I would also like to acknowledge the help of Patrik  Medstrand who acted as a mentor at the beginning of my PhD, and of several fellow students, post-docs and techs who made the Mager lab a great environment to work in. I would like to acknowledge the various organizations that funded my doctoral research. The short term exchange grant provided by the CIHR Institute of Genetics made my attendance to several bioinformatic workshops possible and the monetary support provided by an M R C studentship as well as a Michael Smith trainee award contributed to my participation to several meetings and symposiums (including a memorable conference in Italy!). Finally, and most importantly, I would like to thank my husband, collaborator and best friend, Brian.  ^  xvi  Chapter 1: Introduction  1  1.1 H u m a n Genome An important milestone of the Human Genome Project was completed in February 2001 with the publication of the human genome draft sequence.  Initial analysis of the  assembled sequence revealed a surprising genomic landscape, containing fewer genes and more repetitive sequences than previously estimated. Although the number of human genes had recently been predicted to be as high as 120,000 (Liang et al. 2000), less than 35,000 genes were uncovered in the draft sequence (Lander et al. 2001). This lower gene number suggests that the human genome contains only twice as many genes as the, fly and worm genomes (Lander et al. 2001). Combined, the annotated human coding regions correspond to approximately only 1.5% of the 2.9 Gb human genome while nearly half of the sequence can be recognized as being remnants of transposable elements (Lander et al. 2001).  Mobile  sequences are significant constituents of most eukaryotic genomes, although their proportion and types varies extensively between species (Bennetzen et al. 1998; Lander et al. 2001; Bartolome et al. 2002; Waterston et al. 2002). The density of interspersed repeats is much higher in human than in the genome of other sequenced organisms, such as D. melanogaster and C. elegans, where only 3% and 6.5% respectively of the nuclear D N A is comprised of mobile elements (Lander et al. 2001). On the other hand, the interspersed repetitive content of 45% in human resembles the proportion of the mouse genome, 39%, identified as being derived from transposable sequences (Waterston et al. 2002).  2  1.2 Repetitive Elements The human genome harbours several classes of transposable sequences that are broadly classified in two groups based on their mode of transposition.  The first group  consists of D N A transposons which together account for approximately 3% of nuclear D N A (Lander et al. 2001). Members of this group, which include Charlie and Tigger elements, resemble bacterial transposons in structure as they are flanked by terminal inverted repeats and encode a transposase (Smit and Riggs 1996; Smit 1999) (see Figure 1.1). Autonomous D N A transposons are mobilized by the transposases in a cut and paste fashion by which the elements are excised from the nuclear D N A and reinserted in a different genomic location. So far, no transpositionaly active human D N A transposons have been isolated from the human genome (Smit 1999).  However, several human genes, including the RAG1 and  RAG2 recombinases (Agrawal et al. 1998; Hiom et al. 1998), appear to have evolved from ancient D N A transposons. The second and most abundant group of mobile sequences are the retroelements, which unlike the D N A transposons, move via a copy and paste mechanism. Retroelements proliferated through the reverse transcription of R N A intermediates followed by the integration of the cDNA copies into additional genomic sites and now comprise at least 40% ofthe human genome (Lander et al. 2001). Two major classes of retroelements are usually distinguished on the basis of the presence of long terminal repeats (LTR).  While SINEs  (Short Interspersed Nuclear Elements) and LINEs (Long Interspersed Nuclear Elements) lack LTRs, retroviral sequences and other related sequences are equipped with them (see Figure 1.1).  3  Traspositioh Intermediate  DNA  -j^transposons  Full-length size  ^jjftE  transposase  2-3 kb  f SINEs  l*MyP)**WW A  Copy Fraction Number of genome  300,000  3 %  100-300 bp  1,500,000  13 %  4-11 kb  450,000  8%  n  RNA  LINEs WWmf#*\ ORF1  HERVs  gag  ORF2  pol  env  Figure 1.1 Classes of transposable elements in the human genome. LTRs are represented by arrows, inverted repeats are depicted by triangles, and open reading frames are shown as boxes. Copy numbers and percentages are as calculated from the initial draft of the human genome sequence. Diagram not drawn to scale and adapted from (Smit 1996; Lander et al. 2001).  4  1.3 Short Interspersed Nuclear Elements (SINEs) SINEs are the most numerous mobile elements in the human genome with 1.5 million copies that collectively account for 13 % of the draft genome (Lander et al. 2001; Deininger and Batzer 2002). Members of this group are typically between 100-300 bp in length and lack protein-coding capacity although they possess an internal R N A polymerase III promoter. The most abundant class of SINEs in humans are the Alu repeats, with over 1 million elements (Smit 1996; Lander et al. 2001). These non-autonomous elements consist of two tandem repeats derived from the terminal segments of 7SL R N A , a component of the signal recognition particle.  As they do not contain open reading frames (ORFs), Alu  elements are believed to have replicated by borrowing the necessary enzymes required for mobilization from retro transposition-competent LI elements.  This suggested exploitation  was recently demonstrated in the eel where SINEs were shown to be recognized and retrotransposed by the machinery of LINE elements (Kajikawa and Okada 2002). Although not confirmed for human SINEs, the proposed highjacking mechanism appears to have been highly successful as Alu elements now comprise nearly 11% ofthe human genome (Lander et al. 2001). While the activity and accumulation of Alu repeats has strikingly declined in the last 30-50 million years (MYr) (Smit 1999; Lander et al. 2001), some continue to multiply and integrate at a rate of approximately one per 200 births (Deininger and Batzer 1999). In addition to the Alu elements, two other older types of SINEs exist in the human genome, the MIR and MIR3 families (Lander et al. 2001). Unlike Alus, these mammalianwide interspersed repeats (MIRs) are derived from tRNA sequences and are believed now to be inactive (Smit 1995).  5  SINEs, and more particularly Alu elements, have been shown to have a significant impact on the human genome through several mechanisms. For instance, SINEs, have been reported to promote unequal homologous recombination, cause insertional mutagenesis as well as contribute to the regulation and protein coding regions of cellular genes. Alumediated duplications, deletions and chromosome translocations have been described in nearly 40 human disorders (Deininger and Batzer 1999; Deragon and Capy 2000) while several de novo insertions of Alu elements appear to have resulted in diseases such as breast cancer (Miki et al. 1996) and neurofibromatosis (Wallace et al. 1991). In addition to their role as mutagenic agents, SINEs have also been found to contribute to gene diversity through their presence in transcripts or as novel regulatory elements (Brosius 1999; Tomilin 1999). Between 5% and 11% of human cDNAs have been estimated to contain Alu sequences, the majority of which occur in the untranslated regions (Makalowski et al. 1994; Yulug et al. 1995; P. Medstrand, pers. comm.). MIRs and Alus are also present in the coding region of a significant number of transcripts where they can donate additional domains (Makalowski et al. 1994; Murmane and Morales 1995; Nekrutenko and Li 2001). A recent analysis of nearly 14,000 entries in the Unigene database identified SINEs in the protein coding region of 1.5% of these human genes (Nekrutenko and Li 2001). Interestingly, the vast majority ofthe Alucontaining exons in genes appear to be alternatively spliced (Sorek et al. 2002). Finally, SINEs can also affect genes by modulating their transcription and translation. For example, the Alu sequence in a 5' U T R isoform of the BRCA1 gene was found to reduce translation efficiency (Sobczak 2002) while an Alu-Ll cassette present in the 5' U T R of a zinc finger gene was shown to influence both its transcription and translation (see Appendix).  6  1.4 Long Interspersed Nuclear Elements (LINEs) LINE sequences follow SINEs in prevalence with a copy number of 850,000 elements but cover a greater proportion of the human genome as they are longer (Lander et al. 2001; Deininger and Batzer 2002). The predominant and most characterized family of LINEs are the LI elements which account for 17% of the draft sequence (Lander et al. 2001). The consensus full-length LI is 6 kb in length and contains, in addition to an internal polymerase II promoter and a poly-A tail, two open reading frames (see Figure 1.1).  The  first, ORF1, encodes an R N A binding protein while the 3' ORF2 codes for a protein with reverse transcriptase and endonuclease activity (Ostertag and Kazazian 2001). While some LI elements still encode the proteins necessary for mobilization, the vast majority (>95%) of LINE elements are variably 5' truncated due to incomplete reverse transcription. As a result, it has been estimated that a subset of only 30 to 60 LI copies are now capable of retrotransposition (Sassaman et al. 1997). LI sequences are thought to have played an important role in shaping the human genome.  Beside their own retrotransposition and the amplification of Alu elements as  described in section 1.3, LI elements are likely responsible for the spreading of processed pseudogenes (Esnault et al. 2000).  L i s are also believed to have contributed to the  expansion of the human genome by mobilizing or transducing 3' flanking sequence (Moran et al. 1999; Pickeral et al. 2000). In addition, the retrotransposition of L i s appears to have had an impact on genomic stability.  Their mobilisation probably resulted in large scale  deletions and inversions (Gilbert et al. 2002; Symer et al. 2002) as well as causing a variety of human diseases through direct insertional mutagenesis (Kazazian 1998; Ostertag and Kazazian 2001). As with SINEs, LI elements also affect gene expression by contributing  7  regulatory sequences to human genes. For example, an enhancer for the human lipoprotein Lp(a) gene was found to be located within a LINE (Yang et al. 1998) while several other genes are reported to be transcribed from the antisense promoter of LI elements (Speek 2001). Finally, other important cellular functions have been proposed to be derived from LINEs. In Drosophila, LINE-like elements form tandem repeats at the end of chromosomes and appear to act like telomeres (Lewis et al. 1993; Pardue et al. 1996). In humans, the increased density of LI elements around the X-inactivation centre has lent weight to the Lyon hypothesis (Lyon 1998) that repeats might play a role in X-chromosome inactivation (Bailey et al. 2000).  8  1.5 Human Endogenous Retroviruses (HERVs) 1.5.1 Origin and structure Origin HERVs (human Endogenous Retroviruses) and other Long Terminal Repeat (LTR)containing elements are the third most abundant group of mobile sequences in humans (Lander et al. 2001). These retroviral-related sequences comprise approximately 8 % of the human genome (Lander et al. 2001) and are widely believed to be remnants of ancient retroviral insertions. Integration into the host genome is a required step in the life cycle of infectious retroviruses (Brown 1997). The reverse transcription of the retroviral R N A and insertion of the D N A form make the retroviral presence permanent as integrated retroviruses, called proviruses, are rarely lost from the genome. If the retroviral presence does not kill the infected cell, the endogenized retrovirus will be passed on to the daughter cells. Although retroviruses usually infect somatic cells, sometimes germ cells can also be infected. While the majority of these new integrants will be lost by random genetic drift, some will become fixed in the population.  In such cases, the endogenous retrovirus will be inherited as a  regular genetic trait through vertical transmission from parent to offspring. Alternatively, it is also possible that some of the endogenous retroviral-like sequences in the human genome did not originate from infectious elements but are instead the precursors of exogenous retroviruses (Malik et al. 2000).  Attributes of exogenous retroviruses Independent of their origin, full-length elements resemble infectious retroviruses in sequence and share the characteristic LTR-genes-LTR structure of integrated retroviruses  9  (Wilkinson et al. 1994). Figure 1.2 illustrates the typical genomic organization of a human provirus in which the internal coding region is flanked by repeats, the LTRs, that are generated during reverse transcription of the retroviral R N A genome. The LTRs are required for the integration of the retroviral sequence into the host genome as they are recognized by integrase (Brown 1997). Furthermore, the LTRs bear regulatory elements that are necessary for the transcription ofthe retroviral genes (see section 1.5.3) (Rabson and Graves 1997). These ORFs in simple retroviruses are the gag, pro, pol and env genes that encode the structural proteins and enzymatic machinery required for reverse transcription and replication (see Figure 1.2).  The gene gag codes for the matrix (MA), capsid (CA) and nucleocapsid  (NC) proteins involved in the encapsulation of viral RNA. The gene pro encodes a protease that cleaves the Gag-Pro-Pol and Gag precursor polypeptides. The gene pol codes for two enzymes,  reverse  transcriptase  (RT) and integrase  (IN),  while  env  encodes  the  transmembrane (TM) and surface (SU) envelope proteins that allow the retrovirus to bind receptors on a susceptible host cell. As shown in Figure 1.2, the first three genes, encoding Gag, Pro and Pol, are expressed from unspliced transcripts while T M and S U are translated from spliced mRNAs.  Complex retroviruses, such as the human immunodeficiency virus  (HIV) and human T-cell leukemia virus (HTLV), encode additional proteins, Rev and Rex, that regulate the relative amount of full-length and spliced transcripts (Rabson and Graves 1997; Cullen 1998). They also code for other accessory proteins that modulate expression and infectivity: HIV encodes a transactivator of transcription, Tat, (Cullen 1998) while the mouse mammary tumour virus (MMTV) encodes a superantigen (Sag), that influences infectivity (Golovkina et al. 1998).  10  PBS  accessory genes  A.  gag  pro  Pol  accessory genes  •  LTR  LTR  B. Matrix Major capsid  Protease  Reverse transcriptase Integrase  Nucleocapsid  \  U5j  gag  1  t pro  t  pol  SD\  /  L  env  u  ID  SA  SD-SA AAA  Transmembrane protein Surface protein  Figure 1.2 Genetic organization of an infectious retrovirus. (A) Structure of the pro virus as it is integrated into the host genome. The coding region of the retrovirus is flanked by LTRs composed of U3, R and U5 regions which are further described in Figure 1.4. The gag, pro and pol genes in retroviruses are invariably ordered as shown. Accessory genes, such as tat and rev, are also present in complex retroviruses and overlap the env gene. (B) Transcription between the upstream U3 and downstream U5 regions results in the synthesis of full-length genomic R N A and a spliced transcript form in simple retroviruses. The proteins expressed from each transcript are indicated, as well as the splice donor (SD) and acceptor (SA) sites used to generate the subgenomic-sized mRNA. Diagram adapted from (Vogt 1997).  11  Characteristics of endogenous retroviruses As with complex retroviruses, some endogenous elements, such as members of the H E R V - K group, also encode for additional genes. Some HERV-K(HML-5) elements appear to contain a gene which encodes a dUTPase between the pro and pol genes (Tristem 2000). A HERV-K(HML-2) provirus encodes cORF, a protein analogous to the rev/rex genes of HIV and H T L V , that increases the expression of structural proteins by enhancing the export of full-length transcripts (Magin et al. 1999). While most full-length endogenous elements contain regions with sequence similarity to gag, pro, pol and env, these genes have accumulated mutations since their integration into the primate lineage millions of years ago. The vast majority of retroviral elements are now unable to code for retroviral proteins as their open reading frames are littered with stop codons,  frameshifts  and deletions  (particularly in the env gene, see Figure 1.3). Furthermore, an estimated 85-90 % of HERVs (Lander et al. 2001; Mager and Medstrand In Press) have also undergone recombination between their LTRs resulting in the complete loss of coding sequences and the presence of many solitary LTRs in the nuclear D N A (Figure 1.3).  In addition to the endogenous  retroviruses, another group of human L T R elements exists in the human genome, the MaLRs (Mammalian Apparent L T R Retrotransposons). Full length MaLR sequences consist of two LTRs and a small internal region (1.6 kb) (Smit 1993) with sequence similarity to the gag region of H E R V - L elements (Smit 1999) (Figure 1.3).  As a result, they are sometimes  grouped with class III retroviral elements (see section 1.5.2 for classification).  12  PBS  A.  gag  pro  LTR V  pol  Aenv LTR  B. LTR V  c.  Figure 1.3 Structure of endogenous retroviral elements and related sequences. (A) Full length endogenous retrovirus. The organization and genes are as described in Figure 1.2. However, the majority of the ORFs contain inactivating mutations or deletions (represented by the delta symbol). The primer binding site (PBS) used for H E R V nomenclature is shown. (B) Solitary LTR. The internal coding region of HERVs is often lost by homologous recombination of the LTRs. (C) MaLR elements contain a small central region that resembles gag in sequence and are flanked by two long terminal repeats. Figure adapted from (Griffiths 2001).  13  1.5.2  Classification and diversity  Nomenclature and diversity HERVs have been broadly grouped into three classes based on the sequence similarity of their pol regions to those of exogenous retroviruses: class I includes HERVs related to mammalian gammaretrovirus, such as the murine leukemia virus (MLV), class II comprises all HERVs related to mammalian betaretroviruses, such as the mouse mammary tumour virus (MMTV), while class III contains endogenous elements distantly related to spumaviruses (Smit 1999; Mager and Medstrand In Press). HERVs are further classified on the basis of the tRNA believed to be used for priming in the reverse transcription of the R N A virion into D N A (Wilkinson et al. 1994; Mager and Medstrand In Press). Therefore, a retroviral element that carries a primer binding site (PBS) homologous to the 3' end of a glutamic acid tRNA is called H E R V - E . However, this PBS-based nomenclature is problematic as diverse elements can share homology to a common tRNA.  For this reason a second type of taxonomy has been proposed based on  sequence similarity which is used in conjunction with the PBS classification for the H E R V - K family (see Table 1.1) (Medstrand and Blomberg 1993). In addition, a third naming scheme is used in the repeat database Repbase (Jurka 2000), a catalogue that is used as reference when identifying repetitive sequences using the program Repeatmasker (Smit and Green 1999). The nomenclature of endogenous retroviral elements is under review (Tristem 2000) and is likely to be modified considering the diversity of elements in the genome. Over 200 different human retroviral-like elements or LTRs are already listed in Repbase (Jurka 2000) but few of the repeats have been fully characterized.  14  In addition, a recent phylogenetic  analysis of 7% of the genome, identified ten different HERVgroups that were either novel or only partially described (Tristem 2000).  Abundance and distribution The human genome contains hundreds of H E R V families which, as previously mentioned, are organized in three classes.  Class I comprises 2.9% of the draft genome  sequence and contains 132 groups, including the H E R V - E , H E R V - H and ERV9 families (see Table 1.1).  Class II is smaller and consists of 0.3 % of nuclear D N A and 20 subfamilies,  such as the HERV-K(HML-2) and HERV-K(HML-4). Finally, class III contains 5.1% ofthe genome when combining the 73 subfamilies of HERV-Ls and MaLRs (Lander et al. 2001). As shown, in Table 1.1, there is also variation in the number of elements belonging to each group.  While it has been estimated that there are approximately 300 full-length ERV9  elements, there appears to be only 10 full-length HERV-K(HML-4) (Mager and Medstrand In Press). In each family, the frequency of solitary LTRs is much higher than that of fulllength elements, indicating that solo LTRs might be less harmful and/or more stable in the genome, than complete HERVs.  For the above families, the number of solitary LTRs has  been estimated as being 5000 and 800 respectively (Mager and Medstrand In Press). The high diversity and abundance of HERVs is believed to have resulted from the entry of novel retroviruses into the germ-line followed by an increase in proviral copy number. This retroviral amplification could have occurred either through re-infection or by retrotransposition.  The increase in copy numbers then continued until mutations or  recombination between the LTRs disrupted or deleted genes required for replication (Boeke and Stoye 1997). While the integration preference of HERVs remains to be determined, their  15  presence was likely selected against in some areas of the genome resulting in a non-random distribution pattern (Medstrand et al. 2002). HERVs are particularly underrepresented in some genomic regions, such as the homeobox gene clusters.  For example, not a single  retroviral sequence is present in the human HoxD cluster which spans over 100 kb (Lander et al. 2001).  Overall, retro viral-like elements appear to be excluded from genes as their  densities are lower within genes than predicted based on the surrounding G C content (Medstrand et al. 2002). Moreover, the HERVs found in genes have a higher tendency to be in the antisense orientation, likely to avoid the effects of the transcriptional regulatory elements present in the retroviral LTRs (Medstrand et al. 2002). Although the presence of retroviral sequences in genes appear to be selected against, several LTR-elements have been located in transcripts. A recent analysis of 16,500 entries in Refseq, a curated collection of non-redundant sequences, resulted in the identification of over 600 mRNAs containing retroviral sequences (P. Medstrand, pers. comm.).  Age and polymorphism The majority of human retroviral elements were actively transposing early in the evolution of the primate lineage but some HERVs likely entered the genome prior to the radiation of mammals.  MaLRs and H E R V - L like sequences are present in non-primate  mammalian species (Smit 1993; Benit et al. 1999), suggesting that they integrated over 75 M Y r ago. Most class I elements, on the other hand, are believed to have amplified 30 to 45 M Y r ago, shortly before or following the divergence of Old World Monkeys (OWM) and New World Monkeys (NWM). Members of the H E R V - E and H E R V - H families have been found at the same chromosomal location in humans and baboons (an OWM), indicating that  16  the integration event occurred before the divergence of these species, 30 M Y r ago (Goodchild et al. 1993; Medstrand et al. 2001).  Other class I elements have also been  identified in orthologous positions in humans and marmoset (a N W M ) (Landry J.-R., unpublished). Class II elements are the youngest and are believed to have been active as recently as 200 000 years ago (Turner et al. 2001). This retroviral age has been estimated by measuring the divergence of the L T R sequences. Given that the LTRs are identical at the time of insertion, the integration time can be calculated using a pseudogene divergence rate of 0.13 to 0.21 % per M Y r (Tristem 2000). By identifying H E R V - K elements in different primates (Leib-Mosch et al. 1993), and by calculating retroviral age using a mutation rate of 0.13% (Sverdlov 2000), it has been estimated that class II HERVs first colonized the ancestral primate genome over 30 M Y r ago and continued to proliferate over a considerable period of time. H E R V - K (HML-2) are the only elements known to have amplified after the divergence of humans and chimpanzees 7 M Y r ago and to have elements specific to humans (Medstrand and Mager 1998; Barbulescu et al. 1999).  However, to date, only a few  retroviral insertional polymorphisms have been identified in the human population: two recently integrated full-length H E R V - K retroviral elements were found to be polymorphic in the human population (Turner et al. 2001) while a solitary H E R V - K L T R was found in the HLA-DQB1 locus of some but not all humans (Medstrand and Mager 1998). Because little differences have been observed in retroviral integration patterns in humans, it is likely that very few HERVs, if any, can still retrotranspose. While the mobilization capacity of the vast majority of retroviral elements appears to been silenced, many have remained active at the transcriptional and translational level (see section 1.5.3).  17  Table 1.1 Characteristics of some H E R V families.  Name  PBS  Copy #  mRNA expression  Protein expression  References  I  HERV-E  glu  250 placenta, tumour cell not detected (1000) lines, normal tissues defective ORFs  I  HERV-H  his  1000 teratocarcinoma cells, (1000) placenta tumour cell lines,  I  HERV-R (ERV-3)  arg/ leu  100 (125)  I  ERV-9  arg  300 teratocarcinoma cells not detected (5000) and placenta defective ORFs  (LaMantia etal. 1991)  I  HERV-W  tip/ arg  40 placenta, testis (1100)  (Blond et al. 1999; Mi et al. 2000)  II  HERV-K (HML-2)  lys  60 teratocarcinoma cells, (2500) placenta and normal tissue  II  HERV-K (HML-4)  lys  10 (800)  III  HERV-L  leu  200 (6000)  not detected defective ORFs  placenta, low levels in Env in placenta most cells  Env in placenta  (Rabsonetal. 1983; Rabsonetal. 1985; Gattoni-Celli etal. 1986) (Johansen et al. 1989; Wilkinson et al. 1990; Hirose et al. 1993) (Katoetal. 1987; Cohen etal. 1988; Venables et al. 1995)  Protease, Gag, (Lower etal. 1993; Integrase, RT, Env Medstrand and Blomberg 1993; Mueller-Lantzsch etal. 1993; Sauteretal. 1995;Kitamuraet al. 1996; Tonjes etal. 1997; Berkhout et al. 1999) breast cancer cell line not detected (Seifarth et al. 1998) and placenta defective ORFs placenta  not detected defective ORFs  (Cordonnier et al. 1995)  Several other H M L groups, as well as other class I and class III H E R V groups exist but for simplicity, only the most well characterized families are listed. The three letter amino acid codes represent the primer binding sites (PBS). The copy numbers per haploid genome were estimated in August 2001 and are from (Mager and Medstrand In Press). The values in parentheses represent the number of solitary LTRs approximated for each family. Only tissues with abundant retroviral expression are shown.  18  1.5.3 Activity and Expression RNA expression and promoter activity The transcription of endogenous retroviral sequences has been reported in many cell types, ranging from normal tissues to tumours (Wilkinson et al. 1994) but H E R V expression has been better characterized in placenta and tumour cell lines as a result of their higher expression in these tissues (see Table 1.1). Analysis of H E R V expression is complex since many families contains several hundreds of non-homogeneous members. While considerable variation in the promoter activity of H E R V LTRs, as measured by transient transfection of reporter constructs, has been documented for several families (Feuchter and Mager 1990; Baust et al. 2001; Schon et al. 2001), the techniques used to detect transcripts and assay for expression are usually not specific for individual elements. In addition, the transcription pattern described is influenced by the sensitivity of the method used. For example, by Northern hybridization, the expression of H E R V - K (HML-2) elements are easily detectable in teratocarcinoma cell lines (Lower et al. 1993) while H E R V W transcripts are abundant in placenta (Blond et al. 1999). The R N A patterns from both families resemble those seen with complex retroviruses: full-length and subgenomic env transcripts are detected in the previously mentioned tissues as well as small mRNA species representing alternative splicing or transcription from defective proviruses (Lower et al. 1993; Blond et al. 1999). However, when using more sensitive methods such as RT-PCR or analysing expressed sequence tag (EST) databases, the transcription from these elements appears to be less tissue-restricted. Low levels of H E R V - W transcripts can be identified in fetal spleen and liver (Blond et al. 2001) while H E R V - K (HML-2) and several class I  19  elements can be detected from peripheral blood mononuclear cells and leukocytes (Medstrand etal. 1992).  Transcription regulation The expression of retroviral elements is primarily controlled by specific regulatory sequences located within the U3 region of LTRs (Rabson and Graves 1997). This sequence ranges in length from 150 to 1200 bases in different retroviruses (Leib-Mosch et al. In Press) and contains two major domains involved in transcription initiation. A promoter, usually a consensus T A T A box, is present near the end of the U3 region, close to the U3/R boundary while enhancer sequences with transcription factor binding sites are found further upstream in the U3 region (see Figure 1.4) (Rabson and Graves 1997). A variety of cellular proteins are known to bind the U3 region of H E R V LTRs and to be necessary for high transcriptional activity. The Spl factor stimulates transcription of H E R V elements from various families (La Mantia et al. 1992; Nelson et al. 1996; Sjottem et al. 1996; Schulte et al. 2000) while the Myb protein and YY1 transcription factor influence the expression of H E R V - H (de Parseval et al. 1999) and H E R V - K (Knossl et al. 1999) elements, respectively. Besides the U3 region, functional elements important in H E R V expression are also present in the R and U5 L T R sections (Rabson and Graves 1997).  The R region consists of  a short redundant sequence present at both termini ofthe retroviral R N A (see Figure 1.2) and contains both the transcription start and polyadenylation sites.  In the 5' LTR, the first  position of the R sequence is defined by the transcription start site while in the 3' LTR, the end ofthe R section is 15 to 20 bp downstream of the polyadenylation signal (see Figure 1.4). The last region of the LTR, U5, corresponds to a unique sequence found at the 5' end  20  of retroviral transcripts. Although the U5 region is not involved in transcription, it is part of the 5' untranslated region, which can play a role in post-transcriptional regulation. While the transcription of retroviral sequences is mostly under the control of the LTRs through the actions of cellular proteins, a wide variety of additional factors, including steroid hormones, retrovirally-encoded cORF, and methylation, also influence H E R V R N A levels. The expression of H E R V - K (HML-2) (Domansky et al. 2000), ERV-9 (La Mantia et al.  1991)  and H E R V - H (Wilkinson et al. 1994)  elements has been shown to be  downregulated in NT2/D1 teratocarcinoma cells by treatment with retinoic acid, which induces the differentiation of these cells. Transcription of H E R V - K (HML-4) appears to be steroid dependent in the breast cancer cell line T47D as expression was only detected following treatment with estrogen and progesterone (Seifarth et al. 1998). The stability of HERV-K(HML-2) unspliced transcripts is, as previously mentioned, modulated by cORF proteins which recognize a sequence located in the 3' L T R and favours the exportation of full-length mRNAs from the nucleus (Magin et al. 1999). Finally, D N A methylation, has been proposed to play an important role in regulating the expression of retroviral sequences (Yoder et al. 1997).  Results from methyltransferase mutant mice have supported the  proposal that cytosine methylation suppresses the expression of endogenous retroviral sequences (Walsh et al. 1998).  It has recently been speculated that the differential  methylation of L T R promoters could result in variations of mammalian phenotypes (Whitelaw and Martin 2001).  However, changes in methylation status associated with  transcriptional activity have not been reported for human ERVs due, in part, to the difficulty in analysing large repetitive and heterogeneous families of sequences.  21  YY1  Promoter  (HERV-K)  ( T A T A box)  Myb  SP1  (HERV-H)  (ERV9  PolyA signal  HERV-H)  150-1200 bp  (AATAAA)  10-250 bp  35-450 bp  Figure 1.4 Transcriptional control elements of LTRs. The organization of a composite H E R V L T R is illustrated. Regulatory regions are located within the U3 region (displayed in grey) and the transcription initiation site (represented by an arrow) defines the U3/R boundary. Polyadenylation signals are located within the R region (in black), where the polyadenylation site defines the R/U5 boundary. Transcription factor binding sites are indicated and the H E R V families in which they have been identified are in parentheses. Only m-elements for which functional data is available are shown. The range of sequence lengths for the U3, R and U5 regions, as compiled in (Leib-Mosch et al. In Press), are displayed under the LTR.  22  Protein expression Although the preponderance of H E R V transcripts are translationally defective, a few coding-competent retroviral elements have been documented.  Full length env ORFs have  been identified in at least 4 H E R V - K elements (Barbulescu et al. 1999; Mayer et al. 1999; Turner et al. 2001), as well as in a H E R V - W (Blond et al. 1999), a HERV-R (Cohen et al. 1985) and potentially in one H E R V - H element (Lindeskog et al. 1999). Antibodies raised against HERV-R (Venables et al. 1995) and H E R V - W (Blond et al. 2000) were shown to react to syncytiotrophoblast proteins in placenta, confirming that these HERV-encoded Env are expressed at the protein level. In addition, the pro and pol gene products of H E R V - K (HML-2) elements have been demonstrated to be enzymatically active (Mueller-Lantzsch et al. 1993; Kitamura et al. 1996) while H E R V - K (HML-2) encoded cORF protein was shown to be functional (Magin et al. 1999).  In 1999, a member of the H E R V - K family was  identified with open reading frames intact for almost all retroviral genes with the exception of reverse transcriptase which contained a mutation in a highly conserved motif (Mayer et al. 1999), while a functional reverse transcriptase enzyme was cloned from another element (Berkhout et al. 1999). Most recently, a full-length polymorphic H E R V - K named H E R V K113, was found to have maintained complete open reading frames for all retroviral genes and to contain no amino acids substitutions in conserved sequence motifs (Turner et al. 2001). Immature retroviral-like particles, encoded by H E R V - K (Boiler et al. 1993; Lower et al. 1993), have been identified in G H teratocarcinoma and placenta cell lines (Lower et al. 1996). Other particles have also been reported to be released from a human T47D mammary carcinoma cell line (Seifarth et al. 1995). However, the existence of infectious endogenous  23  retroviral elements cannot be inferred from the detection of retroviral particles in some cell lines, as their formation only entails the expression of retroviral core proteins. To date, the particles identified in teratocarcinoma cells have not convincingly been shown to be able to infect human or other mammalian cells (Blomberg et al. Submitted). There is therefore no evidence that endogenous retroviruses in humans still retain the ability to produce infectious viruses, unlike the situation in other species such as pigs and mice where some endogenous retroviruses have been shown to be maintained as both inactive genetic traits as well as infectious elements (Patience et al. 1997).  24  1.5.4 Biological consequences Pathological implications In animals, both exogenous and endogenous retroviruses are known to be involved in carcinogenesis.  As they do not encode oncogenes, the endogenous elements are thought to  contribute to malignancies primarily through their effect on adjacent cellular genes following de novo insertions.  The best characterized examples of the involvement of murine  endogenous retroviruses in oncogenesis are the spontaneous leukemias caused by the activation of heavily recombined M L V in A K R mice, and the mammary tumours and T-cell lymphomas resulting from the reintegration of M M T V in GR mice (Boeke and Stoye 1997). In both cases, de novo integration of the retroviral sequences in the vicinity of protooricogenes are believed to result in aberrant activation of the latter. Besides modulating the transcriptional regulation of oncogenes, the reinsertion of murine endogenous retroviruses has also resulted in mutagenesis through disruption of the coding region of several cellular genes. Some of the mouse insertional mutations include the hairless mutation caused by an M L V (Stoye et al. 1988), the null alleles at the nude and albino loci that result from Etn (early transposon) insertions (Hofmann et al. 1998) and the inner ear and kidney deformities caused by IAP E R V insertion in the Eyal gene (Johnson et al. 1999). It has been estimated that 10-15 % of all spontaneous mouse mutations are caused by retrotransposition of retroviral elements (Kazazian and Moran 1998; Waterston et al. 2002).  In contrast, the  frequency of insertional mutations of retroelements in the human population is predicted to account for less than 0.2% of spontaneous mutations (Kazazian 1999) and so far none of these cases involve retroviral sequences.  25  While no HERVs have been documented to be engaged in insertional mutagenesis, it has been speculated that some H E R V - K  (HML-2) elements might be involved in  carcinogenesis as there appears to be some correlation between H E R V - K expression and testicular tumours. In one study, nearly 80% of patients with seminomas exhibited antibody titers against HERV-K10 (HML-2) at the time of detection of the tumours compared to 0.1 to 0.5% of healthy individuals (Sauter et al. 1995).  Another potential piece of evidence  regarding the oncogenic potential of H E R V - K elements came from recent work showing that cORF protein can induce tumours in nude mice (Boese et al. 2000). In addition, H E R V K(HML-2)  encoded  cORF  was  found to bind a transcription factor required for  spermatogenesis, PLZF (promyelocytic leukemia zinc finger protein) (Boese et al. 2000). Since disruption of spermatogenesis is associated with increased frequency of gerrri-cell tumours, PZLF binding by cORF might lead to impaired spermatogenesis and testicular cancer. HERVs have also been implicated in a number of autoimmune and neurological diseases, such as lupus erythematosus, rheumatoid arthritis, diabetes, multiple sclerosis (MS) and type 2 diabetes (reviewed in (Blomberg et al. Submitted)). Despite intense effort from several groups, very little direct evidence to support the proposed role of endogenous retroviruses in the etiology of these diseases has been found.  H E R V transcripts were  identified in individuals suffering from schizophrenia (Deb-Rinker et al. 1999) and MS (Perron et al. 1997), but no causative associations could be made between the presence of the retroviral mRNAs and the disease.  A H E R V - K (HML-2)  element with an env gene  encoding a superantigen (SAG) was reported to be associated with insulin-dependent diabetes mellitus (IDDM) (Conrad et al. 1997). This H E R V was proposed to be involved in  26  the etiology of IDDM as transcripts of this element were identified in the plasma of patients with type I diabetes but not of healthy controls (Conrad et al. 1997). However, this finding was later contested as the disease specific expression of this H E R V - K element could not be reproduced (Lan et al. 1998; Lower et al. 1998; Murphy et al. 1998; Jaeckel et al. 1999). So far, a pathogenic role for HERVs has only been confirmed in one condition, male infertility.  HERVs, like other repetitive elements, can give rise to genomic instability by  serving as sites for illegitimate recombination (Hughes and Coffin 2001). The deletion of the azoospermia factor a (AZFa), on Y q l l was found to result from a single cross-over event between two endogenous proviruses that share 94% sequence identity. The outcome of the unequal homologous recombination was a 700 kb deletion in the AZFa region, which severely impaired spermatogenesis and caused infertility (Blanco et al. 2000; Sun et al. 2000).  Genome evolution Besides pathogenic rearrangements, unequal homologous recombination between retroviral sequences has also been implicated in shaping genome plasticity (Hughes and Coffin 2001). It has been suggested that the sequence diversity and genomic duplications present in the human Major Histocompatibility Complex (MHC) resulted partly from the recombination between H E R V elements (Dawkins et al. 1999). Support for this hypothesis has come for the observation that retroviral and retroelement sequences appear to be associated with the ends of duplicated and deleted segments in the M H C (Gaudieri et al. 1999). The existence of two isoforms for the growth hormone receptor gene (GHR) in the human population is also attributed to L T R recombination. The presence or absence of GHR  27  exon 3 in different individuals is believed to result from the recombination between complete and partial HERV-P LTRs flanking the exon (Pantel et al. 2000). It has also been postulated that the presence of retroviral sequences in the human genome contributed to evolution by limiting infections from exogenous viruses (Sverdlov 2000). While there is no direct evidence that endogenous retroviruses protect humans from further viral attacks, ERVs in mice have been shown to confer resistance to their exogenous counterparts.  One of the best characterized example involves the Fv4 locus, which  corresponds to the env gene of an M L V and protects against M L V infection in some mouse strain. The E R V is thought to prevent retroviral entry by blocking the cellular receptors with endogenously ewv-encoded surface proteins (reviewed in (Best et al. 1997)).  A second  identified murine resistance gene, Fvl, protects through an unknown mechanism but is noteworthy, as it has recently been shown to possess sequence similarity to H E R V - L related elements (Benit et al. 1997).  28  Table 1.2 Putative biological effects of HERVs and other endogenous retroviruses. Type  Function  ERV  Example *  References  Pathology  Cancer  MMTV  (Boekeand Stoye 1997)  Insertional mutagen  IAP  Insertions activate cellular oncogenes and cause mammary and lymphoid tumours in GR mice. Integration in agouti results in obese and yellow mice  Deletion  HERV  Recombination between two elements results in the deletion ofthe AZFa gene and infertility  (Blanco et al. 2000; Sun et al. 2000)  Rearrangement  HERV  Association with endpoints of duplicated blocks in the MHC region  (Dawkins et al. 1999)  Resistance  MLV  Expression of endogenous SU protein blocks ecotropic MLV infection in mice  (Best etal. 1997)  Placental development  HERV  Env protein of HERV-W (Blond et al. 2000; Mi et al. contributes to the formation of 2000) the placental syncytiotrophoblast  Genome evolution  ORFs  Gene regulation  (Duhletal. 1994)  Coding region of HERV genes  Terminal 67 aa of the short form (Kapitonov and Jurka of leptin receptor encoded by a 1999) HERV-K LTR  PolyA Signal  HERV-K LTR alternatively polyadenylates the VEGF receptor 3  HERV  Promoter  see Table 1.3  Enhancer  see Table 1.3  * Only one example is listed for each case.  29  (Hughes 2001)  Coding regions Although no human E R V protein has yet been shown to have a beneficial role by preventing exogenous infections, a role for some H E R V encoded proteins has been implied for other cellular functions.  For example, the H E R V - W and H E R V - R encoded Env proteins  have been speculated to play a role in placental biology as they are highly expressed in this tissue. Recent studies have demonstrated that the product of the H E R V - W env gene,, which has been named syncytin, has fusiogenic properties and can induce the formation of syncytia in cells lines (Blond et al. 2000; M i et al. 2000). It has been suggested that the retrovirallyencoded syncytin may perform a similar role in vivo and therefore might contribute to the development of the placenta by promoting cytotrophoblast fusion. Besides encoding functional proteins, other retroviral sequences have evolved a coding role by providing additional sequence to cellular genes. A class I retroviral element was shown to code for exons 2 and 3 of the human transaldolase gene (Banki et al. 1994) while the L T R of a H E R V - K element was found to encode the terminal 67 amino acids (aa) of the human leptin receptor (OBR), short form variant (Kapitonov and Jurka 1999). The OBR isoform resulted from alternative splicing in which additional coding sequence was gained, however in several cases, the variant splicing into retroviral sequences simply results in truncated proteins. A n as example, the short form of the V E G F receptor 3 contains a prematurely terminated ORF due to the splicing of exon 29 to a retroviral L T R (Hughes 2001). In many situations, as it appears to be the case for the last two genes described, an additional outcome of the alternative splicing in retroviral sequence is the utilization of a novel polyadenylation signal donated by the LTR.  30  Gene regulation HERVs possess regulatory elements within their LTRs which, as previously discussed in section 1.5.3, direct the transcription of retroviral genes. These sequences have the potential, when the integration site of a H E R V is adjacent to a cellular gene, to modify the expression pattern of human genes.  Several examples have been described in the  literature where retroviral sequences have acquired a biological function by contributing some of their regulatory elements to human genes as is depicted in Figure 1.5. Retroviral elements contain a polyadenylation signal in their LTRs that can modulate cellular transcripts by processing their 3' ends. As already mentioned, alternative transcripts ofthe V E G F receptor 3 (Hughes 2001) and OBR gene (Kapitonov and Jurka 1999) have been found to terminate with a polyadenylated H E R V - K LTR, indicating that the retroviral polyA signal was employed. Several novel human genes, named PLT, HHLA2 and HHLA3, have also been reported to utilize the polyadenylation signals of retroviral elements belonging to the H E R V - H family (Goodchild et al. 1992; Mager et al. 1999). Full length elements and solitary LTRs also have promoter and enhancer sequences that can affect the transcription of human genes by driving their expression and possibly altering their tissue specificity and/or their temporal/developmental expression. A selection of human genes for which retroviral promoters and enhancers have been described are listed in Table 1.3 and the effects of the retroviral regulatory sequences are summarized below.  31  A. P  SD  PolyA  AAA  B  .  PolyA  ~=(iWM l AAA  C. Poly A  1  2  3  PolyA  H f\ u AAA  Figure 1.5 LTR involvement in cellular gene regulation. The retroviral elements are illustrated as in the previous figures with P, representing the promoter, E , enhancer, and PolyA denoting the polyadenylation signal. The native gene promoter is shown as a grey circle and the cellular exons as black boxes. The primary theoretical effects of LTRs on the expression of nearby genes are shown. (A) The retroviral LTRs can provide an alternative promoter to downstream human genes. (B) HERVs can modulate the activity or tissuespecificity of nearby native promoters by acting as enhancers. (C) LTRs can prematurely terminate cellular transcripts by donating polyadenylation signals.  32  Table 1.3 HERVs involved in the transcriptional regulation of human genes. HERV  Gene  Role  References  HERV-K (HML-8) Leptin  Enhancer  (Bi et al. 1997)  ERV-9  ADH1C (alcohol dehydrogenase lc)  Enhancer  (Chen et al. 2002)  Intergenic region of (3-globin cluster  Promoter  (Plant et al. 2001)  ZNF80 (zinc finger protein)  Promoter  (Di Cristofano et al. 1995)  HERV-R  PLK (human provirus linked Kriippel related gene)  Promoter  (Katoet al. 1990)  HERV-H  PLA2L (phospholipase A2 like)  Promoter  HERV-E  AMYIC (amylase IC)  Enhancer  (Feuchter-Murfhy et al. 1993; Kowalski et al. 1999) (Ting etal. 1992)  PTN (pleiotrophin)  Enhancer  (Schulte et al. 1996; Schulte and Wellstein 1998)  APOC1 (apolipoproteinCl)  Alternative Promoter  Chapters 2 and 3  EDNRB (endothelin B receptor)  Alternative Promoter  Chapters 2 and 3  MIDI (midline 1)  Alternative Promoter  Chapters 4 and 5  33  Two solo LTRs have been shown participate in the transcription of human genes. A solitary ERV9 L T R has been reported to selectively promote the expression of a zinc finger gene, ZNF80, in some haematopoietic cell lines. (Di Cristofano et al. 1995). Two fusion transcripts involving a full-length H E R V - R element and the zinc finger gene, PLK (proviruslinked Kriippel related) have also been described.  These chimeric mRNAs which are  abundant in placenta, contained the H E R V - R encoded env gene spliced to the open reading frame of PLK (Kato et al. 1990).  The L T R of a near complete H E R V - H element also  promotes the expression of a complex chimeric transcript, termed PLA2L (phospholipase A2 like) in teratocarcinoma cells (Feuchter-Murthy et al. 1993). Alternative forms of the hybrid PLA2L mRNA have been identified, including a tripartite fusion that results from the intergenic splicing between the H E R V - H element and two downstream genes.  The first  gene, named HHLA1 (HERV-H LTR-associating 1), is novel while the downstream gene encodes the presumed human ortholog of the murine otoconin-90 gene (Oc90) normally expressed in the developing inner ear (Kowalski et al. 1999). No proteins appeared to be translated from the retrovirally transcribed PLA2L probably as a result of the stable secondary structure predicted to form between the H E R V - H 5' U T R and HHLAl (Kowalski and Mager 1998). Retroviral elements have also been shown to influence the expression of human genes by providing enhancing sequences. The L T R of a H E R V - K element was found to contribute a placental enhancer for the leptin gene (Bi et al. 1997). A liver-specific positive regulatory element for the alcohol dehydrogenase 1C gene (ADH1C) was established to reside within the U3 region of an ERV9 L T R present approximately 1 kb upstream of the transcriptional start site of ADH1C (Chen et al. 2002). Another ERV9 L T R near the /3-globin locus control  34  region (LCR) appears to activate intergenic transcription of the cluster by acting as both a promoter (Plant et al. 2001) and possibly an enhancer (Long et al. 1998)in erythroid cells. Enhancing sequences are also contributed to cellular genes by full-length elements. The insertion of a nearly complete H E R V - E element in the intron of the pleiotrophin gene (PTN), between the 5' untranslated region and coding region, was found to result in the creation of a novel placental-specific promoter.  The H E R V itself did not drive the  transcription of the alternative retroviral-PrN isoform as the chimeric mRNA originated a few bases upstream of the LTR. However, the insertion of the element generated a trophoblast enhancer, which in collaboration with upstream non-repetitive sequence, led to the formation of an additional promoter (Schulte et al. 1996). This PrN-associated H E R V - E element integrated in the primate lineage after the divergence of apes and old world monkey (OWM)  and added a new promoter function to humans and great apes as compared to  monkeys and rodents which only possessed the evolutionary common central nervous system promoter (Schulte et al. 1996; Schulte and Wellstein 1998).  Concordantly, placental  expression of PTN could not be detected in rhesus monkey, an O W M (Schulte and Wellstein 1998) while BERV-PTN  transcripts could be identified in human placenta, trophoblast and  trophoblast-derived cells (Schulte et al. 1996). Another full-length H E R V - E , located in the antisense orientation upstream of each of three AMYX genes in the human amylase gene complex is believed to act as a salivaryspecific enhancer (Samuelson et al. 1990). The retroviral element, which inserted prior to the duplication of the AMY genes, was first suspected to confer tissue-specificity as its integration was found to correlate with the transition from pancreatic to parotid expression for the AMYX genes (Samuelson et al. 1990). This was confirmed in the case of AMYIC,  35  where a 700 bp fragment representing the 5' U T R ofthe H E R V - E retrovirus, was found to be sufficient for the parotid expression of this gene (Ting et al. 1992). Interestingly, the retroviral sequence does not appear to be required for the salivary expression of amylase in some primates. Although the H E R V - E element integrated in the amylase cluster after the divergence of apes from old world monkeys, amylase expression could be demonstrated in the saliva of macaques (OWM) (Samuelson et al. 1996). It is evident from the above enumeration that there is now a number of documented cases in which retroviral sequence have been assimilated in the human genome to provide enhancers, promoters, and polyadenylation signals to cellular genes.  This recurrent  involvement of LTRs in the expression of human genes supports the notion that endogenous retroviruses were important players in the evolution of gene regulation. However, a better understanding of the biological significance and contribution of HERVs in the transcription of the human genome awaits the results of further research.  36  1.6 Thesis objectives and organization The overall objective of my thesis was to investigate endogenous retroviruses involved in the transcriptional regulation of human genes. More specifically, the goal of my project was to examine HERVs that contribute to the expression of cellular genes by providing alternative promoters.  This aim was addressed through the study of three  retroviral elements that respectively  participate in the transcription of the human  apolipoprotein cl (APOC1), endothelin B receptor (EDNRB) and midline 1 (MIDI) genes. The primary focus of my work, which is presented in chapters 2, 3 and 4, was to characterize the promoters of the three LTRs and to examine the significance and contribution of the H E R V - E elements in the expression of the above mentioned genes.  A secondary intent,  which is described in chapter 5, was to analyse alternative, non-retro viral, promoters for the genes found to be transcribed by LTRs.  Chapter 2 This section describes the discovery and confirmation that H E R V - E LTRs act as alternative promoters for the human EDNRB and APOC1 genes.  In this chapter, we  determined that both LTRs promote transcription in vivo and in vitro but appear to differ in their respective activity and tissue-specificity.  We also established the integration time and  sequence of the two retroviral elements in the primate lineage.  37  Chapter 3 The promoter of the EDNRB associated L T R is further characterized through mutation and deletion analysis. Differences in sequence and promoter activity between the APOC1 and EDNRB LTRs were also exploited in this chapter to delineate the regions and motifs required for strong EDNRB L T R activity in placenta.  Chapter 4 This section details the identification and investigation of a retroviral promoter for the MIDI gene.  Results from real-time PCR and transfection studies are shown which  determined that the H E R V - E promoter contributes to MIDI levels in placenta and embryonic kidney. In this chapter, we also dissected this L T R to elucidate regions that confer placenta activity and specificity.  Chapter 5 In addition to the retroviral first exon described in chapter 4, several other mRNA isoforms with variant 5' ends had been reported for MIDI.  However, their genomic  organization and expression as well as the promoter regions regulating their transcription had not been determined. Chapter 5 reports the characterization of these alternative MIDI first exons and their associated promoters in human and other species.  38  Chapter 6 This chapter summarizes the results and discusses how they have improved our understanding of a role for human endogenous retroviral elements in the transcriptional regulation of human genes. Further investigations which could complement this study are also suggested.  Appendix This section represents studies performed to determine the effects of Alu, LI and H E R V sequences present in the 5' U T R of the human zinc finger gene, ZNF177. Our findings indicate that while the retroviral element does not appear to influence the gene's expression, the Alu and LI repeats exert a positive transcriptional enhancing effect but repress translation of the zinc finger gene.  39  Chapter 2: Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-l genes in humans  This chapter is based on results from a manuscript entitled as above by P. Medstrand, J.-R. Landry and D.L. Mager, published in 2001 in the Journal of Biological Chemistry, 276:1896-1903 as well as unpublished results by J.-R. Landry  P. Medstrand performed the 5' R A C E to confirm the authenticity of the chimeric transcripts as well as the RT-PCR amplification of the APOC1 and EDNRB transcripts from various tissues displayed in Figure 2.1  40  2.1 Introduction A previous post-doctoral fellow in the Mager lab, Patrik Medstrand, searched the expressed sequence tag and transcribed subset of GenBank for chimeric retroviral-gene sequences in order to, study the involvement of HERVs  in human gene expression.  Transcripts isolated with only the U3-R part of the L T R and no other H E R V sequence were assumed to correspond to an mRNA polyadenylated by the LTR, whereas transcripts with RU5 or R-U5-leader were believed to represent promotion by an L T R . Through these database screenings, he identified two transcripts in which the LTRs of H E R V - E elements were fused to the coding regions of the endothelin B receptor (EDNRB) gene (accession number D90402) and the apolipoprotein CI (APOC1) gene (accession number W79313). The structure ofthe chimeric mRNAs suggested that the EDNRB utilizes the SD in the leader region of the H E R V element, which is located downstream ofthe 5' LTR. The same SD is used in the subgenomic splicing of a H E R V - E envelope transcript, suggesting that this was the original SD of H E R V - E (Rabson et al. 1985). The APOC1 fusion transcript represented another possible LTR-driven transcript type, which is derived from a solitary L T R and reads into the flanking non-HERV region. The detection of these two new possible cases in which H E R V LTRs appeared to be used as alternative promoters for cellular genes provided the rationale for my thesis project. As a starting point, we further characterized the chimeric transcripts and retroviral elements that generated them.  The utilization of H E R V - E LTRs as alternative promoters for the  APOC1 and EDNRB genes and the presence of the chimeric transcripts in human tissues are demonstrated in this chapter, as well, the significance of the LTRs at the genomic loci of these two genes is also illustrated.  41  2.2 Materials and Methods 2.2.1 Rapid amplification of cDNA ends 5' R A C E was performed using a Marathon-ready placenta and brain cDNA library (Clontech) according to the manufacturer's protocol. The first round of PCR amplification was carried out using EDNRB-speciiic oligo 184 and API (provided by the supplier) as well as with the APOC1 primer 179 and API.  The nested round of amplification was performed  using the EDNRB oligo 185 and the AP2 primer (provided by the supplier) and with the APOC1 primer 180 and AP2. are listed in Table 2.1.  A l l the oligonucleotides used for R A C E and other procedures  The following temperature profile was used for all amplifications:  one initial denaturing at 95 °C for 1 min followed by 35 cycles at 95 °C for 30 s and annealing and extension at 68 °C for 4 min. The 5' R A C E products were cloned using the pGEM-T vector system I (Promega). Clones were selected for sequencing after hybridization using retrovirus-specific oligonucleotides 181 for APOC1 associated L T R and 186 for the EDNRB L T R .  2.2.2 Reverse Transcription and PCR Amplification First-strand cDNA was synthesized as previously described (Medstrand et al. 1992) using random primers, Superscript II Reverse Transcriptase (Gibco BRL) and 10 ug of RNA. R N A samples were either obtained from Clontech or prepared from different sections of placenta, as described previously (Wilkinson et al. 1990). The following primers were used to detect the different transcript forms shown in Figure 2.1 (see Table 2.1 for primer sequences): LTR-APOC1  fusion transcript, primers 181-179; native APOC1 transcript,  42  primers 182-179; UTR-EDNRB fusion transcript, primers 186-183; native EDNRB transcript, primers 187-183. Amplification were performed using the following cycling profile: one initial incubation of 95 °C for 1 min followed by 35 cycles (for the APOC1 amplifications) or 30 cycles (for the EDNRB amplifications) of 95 °C for 30 s, 63 °C for 30 s, and 72 °C for 30 s, and one final elongation at 72 °C for 5 min.  2.2.3 Plasmid Constructs Promoter constructs were designed by amplifying the 5' flanking regions of APOC1 and EDNRB from human genomic D N A and subcloning upstream of luciferase in the pfomoter-less plasmid pGL3B (Promega).  The following EDNRB constructs were made.  For EDNRB Native, approximately 1 kb of flanking region upstream of the native EDNRB transcription initiation site was isolated using primers 165 and 166 (positions -1259 and 187 relative to the translation initiation site) and inserted into KpnVBgllJ site of pGL3B. For EDNRB LTR, the flanking region around the L T R was amplified using primers 16 and 17, followed by a nested reaction in which the complete L T R was amplified with primers 18 and 19 and inserted into AT/wI/Tig/II-digested pGL3B.  The enhancer constructs, EDNRB LTR-  Native, were made by introducing the L T R at a distance of the native promoter region of construct EDNRB Native. The full L T R was amplified with the flanking primers 16 and 17 and LTR-specific primers, 167 and 168 and introduced into the BamRl site of construct EDNRB Native, which is located 2 kb from the KpnVBglll site on pGL3B.  The L T R was  introduced in constructs either in sense (LTR-S) or antisense (LTR-A) with respect to the native EDNRB promoter region where isolated.  43  The following APOC1 constructs were made. For APOC1 Native, the 5' flanking region of the APOC1 transcription initiation site was isolated using primers 171 and 172 (positions -1271  to -145  relative to the translation initiation site) and inserted into  KpnVBgill-digested pGL3B. This construct contains both the native and L T R promoter regions. For APOC1 LTR, the complete L T R of the APOC1 locus was amplified by nested PCR with primers 13 and 14 using the products obtained with primers 171 and 172 as template.  For APOC1 Native - L T R , the L T R was removed from the APOC1 locus by  amplifying the non-LTR parts of APOC1 Native with primers 171-173 and 174-172. The two amplification products were digested with Xbal, ligated together, and introduced into pGL3B after KpnVBgUl digestion. This construct has the same structure as the APOC1 Native plasmid except that the L T R is absent. The following baboon APOC1 constructs were made. For Baboon APOC1 Native, the baboon APOC1 locus was amplified from baboon genomic D N A with the primers 171172 and inserted into £p«I/2?g/II-digested pGL3B. This construct does not contain the L T R as the retroviral element is not integrated in baboon genomic D N A . However, the human L T R was inserted in the baboon native promoter region in the construct, Baboon APOC1 Native +LTR. For this construct, the baboon APOC1 locus was amplified with primers 171175 and 176-172.  The two amplification products were digested with BamHI, ligated  together and inserted into pGEM-T (forming the temporary construct BAPO-T). The L T R was then amplified with primers 177-178 and introduced into the BamHI site of BAPO-T. After selection of LTR-positive clones, the KpnVBglJI cassette (containing the L T R in the baboon APOC1 at the same orthologous site as in humans) was subcloned into pGL3B.  44  For all of the APOC1 constructs, the hepatic control region (HCR) was isolated using primers 169-170. The PCR product (which represents position 36815-37218 of accession number AF050154) was digested with Xhol. A purified fragment was introduced into the pGL3B Sail site of all APOC1 constructs, which is 3' to the luciferase gene.  2.2.4 Cell Lines and Transient Transfections HepG2 (human hepatoblastoma cells) cells were cultured in Dulbeccos's minimal essential medium supplemented with 10% fetal calf serum. Cells were seeded 24 h prior to transfections in six-well plates at a density of 3 x 10 cells/well. Transient transfections of 5  HepG2 were done by cotransfecting 1.5 /ig of plasmid D N A and 50 ng of pRL-TK vector (Promega) using calcium phosphate (Cellphect; Amersham Pharmacia Biotech) as described in the protocol supplied with the reagent.  JEG-3 cells (human choriocarcinoma) were  maintained in RPMI supplemented with 5% fetal calf serum. JEG-3 cells were seeded in sixwell plates at a density of 2 x 10 cells/well and cotransfected 24 h later with 1.0 jug of 5  plasmid D N A and 200 ng of pRL-TK using 7 fil of LipofectAMINE (Life Technologies, Inc.), as described in the protocol from the supplier. After 24 h, the cells were lysed, and the luciferase activities were measured using the Dual-Luciferase Reporter Assay System (Promega) and normalized to the internal control. triplicates and repeated at least twice.  45  Transfections were performed in  2.2.5 Locus-specific P C R Locus specific PCR was performed using genomic D N A prepared from marmoset (New  World  Monkey), baboon  (Old World  Monkey),  gibbon,  orangutan,  gorilla,  chimpanzee, and human cell lines (Goodchild et al. 1993). Primers 171 and 172 were used to detect the presence or absence of the solitary L T R in the APOC1 locus of different primates.  The presence of the L T R upstream of the EDNRB locus was detected by  amplification using primers 16-7 but the complete L T R was amplified for sequencing by using primers 16-17.  46  Table 2.1 Primers used for APOC1 and EDNRB genomic and cDNA amplifications Oligo  Sequence  Description  7 13 14 16 17 18 19 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187  GATCGACCCCTGACCTAACC cggggtaccTAAGGGAGGAGACCACCCCT gaagatctTGTAGCAGGAGGAGCCGCAG AACATCCTCTGTCTCTCC CCAGTTCCTTCCCAGTGT cggggtaccTAAGGGAGGATACCACC gaagatctTGTAGCAGGACAAGCTGC ggggtaccACATGGTGCGTGATAACTTGC gaagatctGGAAACTTGGCGCTCATGACT cgggatccTAAGGGAGGATACCACC cgggatccTGTAGCAGGAGAAGCTGC ccgctcgagTTAGAGAACAGAGCTGCAGGCT ccgctcgagGCTTCGGGGTCGGGGCAT ggggtaccGGTTTTTACAGTGTCATCCAGCT gaagatctCACTCAGCACCAACCTGAATC gctctagaAGACAGCCTTGTCCTGTC gctctagaGTGGGGAAAGGGACTAAGGT cgggatccCAGCCTCGTCCTGTCCCTGT cgggatccGTGGGGAAAGGGACTGTAGTAG cgggatccTAAGGGAGGAGACCACCCCT cgggatccTGTAGCAGGAGGAGCCGCAG AGCCGCATCAAACAGAGTGAACTT TCCTCCTGCTACATTCTGAGTGG GTCTGAGGAATTTTGTCTGCGGCT CCAAGCCCTCCAGCAAGGATTC AGTCTATGTGCTCTGAGTATTGAC GACGCCACCCACTAAGACCTTATG GACGCCTTCTGGAGCAGGTAGCA CATGGAGGATCAACACAGTGGCT TTACTTTTGAGCGTGGATACTGGC  EDNRB in L T R APOC1 LTRKpnl APOC1 L T R Bglil EDNRB L T R flanking EDNRB L T R flanking EDNRB L T R Kpnl EDNRB L T R Bglil EDNRB Native Kpn EDNRB Native Bglil EDNRB L T R S BamHI EDNRB L T R A BamHI APOC1 H C R S Xho APOC1 H C R A Xho Hapo Kpn Hapo Bglil Hapo (-LTR) Xba Hapo (-LTR) Xba Bapo BamHI Bapo BamHI Bapo L T R BamHI Bapo UTR BamHI APOC1 RACE1 APOC1 RACE2 APOC1LTR APOC1 Native EDNRB RT-PCR EDNRB RACE1 EDNRB RACE2 EDNRB L T R EDNRB Native  47  2.3 Results 2.3.1 Identification and characterization of chimeric transcripts To  authenticate  the presence of fusion APOC1  and EDNRB  transcripts, we  synthesized primers corresponding to the retroviral and the gene-specific regions of the identified transcripts. By using this primer combination in RT-PCR, it is possible to detect the presence and the relative abundance of the fusion transcripts in human tissues. Both of the genes were previously reported as having a different promoter region, separated from the retroviral L T R (Lauer et al. 1988; Nakamuta et al. 1991; Ogawa et al. 1991). We will refer to these two regions as the "native" APOC1 and EDNRB promoters respectively.  To detect  any biases of the L T R and native transcripts, we also used a primer unique to transcripts of the native promoters of the two genes. Results of the RT-PCR on a panel of RNAs derived from different human tissues are shown in Figure 2.1. The LTR-promoted EDNRB transcript is restricted to placenta, where its levels appear comparable with that ofthe widely expressed native transcript, as is shown in Figure 2.1 (B). In the case of APOC1, transcripts from the native promoter in liver were high as expected (Lauer et al. 1988) but are also detectable by PCR. in many of the other RNAs tested, see Figure 2.1 (A). Transcripts from the solitary L T R were detected in two distinct forms (see Figure 2.2), both of which were also detected in many tissues. The result of this experiment clearly demonstrates the presence of fusion transcripts between LTRs of H E R V - E and the genes for EDNRB and APOC1.  The LTRs at  the EDNRB and APOC1 loci vary in their tissue specificity, with the EDNRB L T R being much more restricted in activity. Sequencing ofthe PCR products verified the nature of the fusion transcripts, where the two fusion transcript forms of APOC1 are the result of differential splicing in the 5' U T R (Figures 2.2 and 2.3).  48  Figure 2.1 Detection of APOC1 and EDNRB chimeric transcripts in different tissues. (A) Upper panel, LTR-APOC1 fusion transcripts were detected by using L T R and APOC1 exon-specific primers in RT-PCR. Lower panel, amplification products derived from primers detecting transcripts of the native APOC1 promoter. (B) Upper panel, detection of UTR-EDNRB fusion transcripts by using leader (derived from the provirus) and EDNRB exon primers. Lower panel, result of amplification using primers specific for transcripts derived from the native EDNRB promoter. Expected amplification product sizes were obtained for the different primer combinations.  49  2.3.2 Genomic structure and transcript forms To confirm that the APOC1 and EDNRB fusion transcripts initiate within the LTRs and do not represent transcripts from a promoter located upstream of the LTRs, we isolated the 5' ends of both L T R fusion gene transcripts. Using a 5' R A C E protocol, we established that both the APOC1 and EDNRB fusion transcript initiate within their LTRs (see below and Figures 2.2 and 2.3).  Sequencing of several 5' R A C E clones showed that the APOC1 and  EDNRB initiation site is located downstream of a previously reported T A T A box of H E R V - E .(Rabson et al. 1985; Repaske et al. 1985). This is the T A T A also used by other H E R V - E proviruses because a full-length transcribed H E R V - E element (GenBank accession number M74509) starts 2 bp downstream ofthe initiation site ofthe APOC1 LTR. In the case of the EDNRB fusion transcript, the sequence representing the longest 5' U T R also began within the LTR, but at a position 3' (90 bp) to the APOC1 initiation site. Both the APOC1 and EDNRB genomic loci were partially characterized at the time of our initial studies.  The only retroviral remnant of the original proviral insertion at the  APOC1 locus is a solitary LTR, which is located 300 bp upstream of the native APOC1 promoter. The two initiation sites are separated by 390 bp, where the initiation sites of the native and L T R promoters are located 180 and 575 bp upstream ofthe APOC1 initiation codon, respectively (Figure 2.2).  The EDNRB L T R was not present in the reported 2 kb  sequence upstream ofthe EDNRB native promoter (GenBank accession D13162), which is located -250 bp upstream of the EDNRB initiation codon (Nakamuta et al. 1991; Ogawa et al. 1991). A genomic clone containing both the H E R V - E proviral element and the EDNRB genomic locus had recently been deposited in GenBank (accession number AL139002) at the time the chimeric transcript was identified. The sequence of this clone was in a preliminary  50  state of assembly but the sequence from this region of the genome has since been assembled. A 6 kb H E R V - E element with L T R and leader regions identical to the retroviral sequence present in the chimeric EDNRB mRNA was found to be present 52 kb upstream of the EDNRB initiation codon. In addition to the retroviral 5' UTR, three other EDNRB mRNA forms were reported by another group to differ in their first exon (Tsutsumi et al. 1999). While two of the novel transcript forms encoded identical proteins, an alternatively spliced from gave rise to a longer endothelin B receptor N-terminal as a result of the presence of an upstream in-frame start codon (Tsutsumi et al. 1999). The 5' variants, containing exons lb, lc and lc-2, as well as the previously cloned EDNRB transcript, with exon v2, appeared to share a common promoter region as their transcription initiation site clustered within a 1 kb region (see Figure 2.3). The first exons of the EDNRB transcripts containing exons lb and lc and the 5' L T R leader of the H E R V - E element (exon la) are joined by splicing to the same splice acceptor in exon 2. The 5' UTRs of EDNRB mRNAs containing exons lc-2 and v2 are not spliced. The genomic organization and the structures of the different transcripts arising from the retroviral and native promoter region are shown in Figure 2.3. Due to the retroviral sequence, the fusion transcripts have partially different 5' UTRs compared with the native forms, but all maintain the same APOC1 and EDNRB coding regions (with the exception of one EDNRB variant expressed from the native promoter).  51  Figure 2.2 Schematic representation of the human APOC1 gene transcript isoforms. (A) The structure of genomic D N A of APOC1, where the position of the solitary LTR, depicted as an arrow, is shown with respect to the APOC1 exons, which are shown as rectangles. The native promoter, represented as a circle, is indicated upstream of exon 2. (B) Schematic illustration of three different forms of APOC1 transcripts. The two LTRAPOC1 forms were determined by RT-PCR and 5' R A C E . The APOC1 form derived from the native promoter was reported previously (Lauer et al. 1988). Distances are not drawn to scale.  52  c ...ggatgagatcacagggttattactgggagacccctgagggaagatggccacagggacaggacaaggctgtcttct taagggaggagaccacccctcatattgtcttatgcccaatttctgcctccaaagaaagaaaaagtaaaaactaaaa \ ggcagaaatgaaatccacaagcagacagcccgcgccacaccctgggcctggtggttaaagattgacccctgaccta atccgttaggttatctatagattacagacattgtatagaaaagcactgtgaaaatccctattctgttttgttccga  LTR  tctaattaccggtgcatgcagcccccagtcacgcatcccctgcttgttcaatcgatcacgaccctctcacgtgcacc cacttagagttgtgagcccttaaaaggaacagggattgctcac  TCGGGGAGCTCGGCTCTTGAGACAGGAATCTT  GCCCATTCCCCGAACGAATAAACCCCTTCCTTAACTCAGCGTCTGAGGAATTTTGTCTGCGGCTCCTCCTGCTACA  TTCTGAGTGGGGAAAGGGACTAAGGTGGTCTGAGGACCCCACAGAGTCAGGAAGATTGAGAG  Exon 1  gtgagagtgctgaa  cggggaggggctttggggctaagggaagtgcccgggaccccacctgaccccaacgctcacgggacaggggcagagga gaaaaacgtgggtggacagagggaggcaggcggtcaggggaaggctcaggaggagggagatcaacatcaacctgccc cgccccctccccag | C C T G A T A A A G G T C C T G C G G G C A G G A C A G G  i  ^CCTCCCAACCAAGCCCTCCAGCAAGGATTCAG  I  gttggtgctgagtgcctgggagggacacccgcctacactctgcaagaaactcaaaaagggagatgaggggatcgtgg gagggaggtagggagggaggagggtgccactgatcccctgaacccctgcctctgcctccag A G T G C C C C T C C G G C C rTCGCCATGAGGCTCTTCCTGTCGCTCCCGGTCCTGGTGGTGGTTCTGTCGATCGTCTTGGAAG  Exon 2 Exon 3  btaaaagtgggat  gggagaattgcggagttggagatttggaagagtgaaggtggctacaggcctggggtcccggcttagaggacctctg...  (C) D N A sequence of the promoter regions upstream ofAPOCl. The solitary L T R sequence is enclosed by a dashed pointed rectangle and the putative T A T A box is shown in bold. Nonintronic transcribed sequence is shown in uppercase and framed, where exon 1 initiates in the L T R and is spliced to exon v2, the beginning of which is shown by a dashed box. Exon 2 is the start site derived from the native promoter and is framed.  53  1 A  kb  1  TATA SD Ula  ATG ]  ATG  1  r B  la  2 \-  lc  E  S57283  AF114165  AF114164  AF114163  S44866  lb  LH 0  0 . 1C-2 v2  Figure 2.3 Organization of the alternative 5' UTRs of the human EDNRB gene. (A) Genomic organization of EDNRB where the position of the H E R V - E retroviral element, depicted as a rectangle flanked by two arrows, is shown with respect to the second exon of EDNRB. The location of the putative T A T A box is indicated in the L T R (arrow) and the splice donor site is indicated by SD. The H E R V - E element is approximately 6 kb in length and resides 52 kb upstream of the "native" promoter region, which is depicted as a circle. The transcription of four variant 5' U T R isoforms appear to initiate within 1 kb downstream of the native promoter. The alternative transcripts possess identical coding regions starting with the A T G indicated in exon 2, with the exception of the isoform lb which contains an upstream A T G (in a dashed box). (B) Illustration of the alternative 5'transcript forms. The accession numbers from which the 5' variants are derived are indicated beside the alternative first exons.  54  c  taagggaggataccacccctcatattgtcttatgcccaatttctgcctctgaagaaagaataagtaaaaactaaaag \ gcagaaatgaaatccactggcagacagtctggtgccacaccctgggtctggtagttaaagatcgacccctgacctaa ccggttatgttatctatagattccagacattgtatggaaaagcactgtgaaaattcctgtcctgttctgttctg'atc tgactaccagtgcatgcagcccccagttatgtacctgctgcttgctcaatcaatcacaaccctttcacgcagacccc cttagagctgtgagcccttaaaagggataggaattgctcactcagagagctcaqctcttgaqacaqgagtcttqccga t g c t t c t g g c c g a a t a a a c c t c t t c t t t c t t t a a t t c g g t a t c c g a g g a a t t t [ TGTCTGCAGCTTGTCCTGCTACA / TTTCCTGGTTCCCTGACTGGGAAGTGAGGTGATTGGTGGATGGTCGAGGCAGCTCCTTAGGTGACTTAAGCCTGCCC  Exon l a  TGTGGAACATTCCGGTGGGGGACTCTGGCCAGCCCGAGCAACGTGGATCCTGAGAGCACTCCCAGGTAGGCATTTGC CCCGGTGGGACGCCTTGCCAGAGCAGTGTGTGGCAGGCCCCCGTGGAGGATCAACACAGTGGCTGAACACTGGGAAG GAACTGGTACTTGGAGTCTGGACATCTGAAACTTGG|taagactagtctttggaacttgcccactccatttgagtag a g c g t g g c t t g c t c a c / 57 kb / a a c t t g c c c t t g a t t t g g g t t c a t t t g a a g a g c g t a g a a c t c t a a c a a a t AAACAGCCTTTTGGGACCTGTCCCCGGACGAGGACTGCCCCCCTCCCTCGGGCAACTACTACTGATGCTGTCCAGGC ATCGCCCAAGGGGAAAGGTTGCAGCGGGGTCGGAAGGCGCGGGAGGAGTCTGGCGGTGATTGATGGGAAGGGATGAA TGAATAAAAGTACTTGTCTGATGGCAGCAGAGACCCCGAGCAAACGGTGGAGGCTACACTGTCTGGCATTCTCGCAG  Exon l b  CGTTTCGTCAGAGCCGGACCCGCCTGCAGCTCAAGGGAGGCGTGCTCCTCTCCCAGAGCAGGCTGGAACCCAGCTGG GTTCCGCCTCCCGGGAAGGTGGTCTCCATTCGTCGCTCTGCATCTGGTTTGTCAGATCCGAGAG I g t a a a c a t t c g g gcttggtgttgaattaaaatcattgatt  /0.55 kb / i g a g g g c a t c a g g a a g g a g t t t c g a c c c g c g c t g g c g a i  Exon v2  gtcatgagcgccaagtttcccactggcgcgcaaacttgagttacttttgagcgtggatactggcgaagaggctgcgg gcggtattagcgtttgcagcgacttggctcgggcagctgacccaagtgtcctgtcttccttcctctgcttgtctcta GGCTCTGAAACTGCGGAGCGGCCACCGGACGCCTTCTGGAGCAGGTAGCAGCATGCAGCCGCCTCCAAGTCTGTGCG  Exon 2  GACGCGCCCTGGTTGCGCTGGTTCTTGCCTGCGGCCTGTCGCGGATCTGGGGAGAGGAGAGAGGCTTCCCGCCTGA...  (C) Genomic sequence of the H E R V - E element in the EDNRB locus. The 5' L T R is enclosed by a dashed pointed rectangle and the putative T A T A box is shown in bold. The exon sequences are shown in uppercase and framed. The first part of exon la is located in the L T R and is joined to exon 2 by splicing. The splice donor is located in the proviral leader region, and the splice acceptor (SA), defining the start of exon 2, is located 57 kb downstream. Two non-retroviral first exons are shown which are derived from a shared native promoter region. Exon lb is framed while variant ex2, which is fused and not spliced to exon 2, is enclosed by a dashed box. Note that exon lc, which appears to initiate 7 bp downstream of the end of exon lb, is not indicated.  55  2.3.3 Evolutionary age of the L T R s Using primers flanking the integration sites of the LTRs in PCR of different primate DNAs, we earlier assigned the time of integration of various H E R V - K elements during primate evolution (Medstrand and Mager 1998). Using the same approach, we were able to determine when the two H E R V - E LTRs integrated in the primate lineage. The APOC1 L T R was detected in all hominoids, whereas Old and New World monkeys did not have this L T R integrated in the APOC1 locus, suggesting that the integration took place after the divergence of hominoids and the Old World monkeys, about 20-30 million years ago (Sibley and Ahlquist 1987). The sequences of the APOC1 L T R isolated from human and different apes are displayed in Figure 2.4. Since we could detect the presence of the EDNRB L T R both in baboons and hominoids, but not in New World monkey, we conclude that the L T R at the EDNRB locus is older than the APOC1 LTR, because it integrated after the split between the New and Old World monkeys, -30^10 million years ago (Sibley and Ahlquist 1987). The EDNRB LTRs sequenced from various primates are shown in Figure 2.5.  Sequence  comparison of the 5' and 3' LTRs of the human EDNRB H E R V - E revealed that they are 12% divergent. The same time estimate of 30-40 million years is obtained by assuming that the two LTRs diverged an average of 6% since integrating in the primate lineage, taking a pseudogene divergence rate of 0.15-0.21%) per million years into account (Li and Tanimura 1987; Tristem 2000).  56  Human Chimpanzee 0 ra ngu tan G i bbon  TAAGGGAKAGACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTCCAAAGAA TAAGGGA|I;AGACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTCCAAAGAA TAAGGGA*;AGACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTCCAAAGAA TAAGGGABJAGACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTCCAAAGAA  Gori11a  Human Chimpanzee Gorilla Orangutan Gibbon  : : : : :  \AATGA AAATGA \AATGA AAATGA \ AATGA  iTCCACAAGC ATCCACAAGC ATCCACAAGC ATCCACAAGC ATCCACAAGC  ; A c A iaaa ;AC'A( JACAG ;ACA. m  TAAAAACTAAAAGGCAG TAAAAACTAAAAGGCAG TAAAAACTAAAAGGCAG TAAAAACTAAAAGGCAG TAAAAACTAAAAGGCAG  81 B1 S0 81 81  (1CCACACCC1 1CCACACCC1 .iCCACACCC'l ICCACACCC1  157 157 156 161 158 220 24 0 234 \GCACTGTGAAAATCCCTATTCTGTTTTGTTCCGATCTAAT 234 XGCACTGTGAAAATCCCTATTCTGTTTTGTTCCGATCTAAT 233 \GCACTGTGAAAATCCCTATTCTGTTTTGTTCCGATCTAAT 24 2 \ r; c A CT G TG A A A A T r r rTATTTT GTTTTGTTCCGATCTAAT 23 5  Human Chimpanzee Gori11a Orangutan Gibbon  T TAA'I'l -ATCAC JCCCCCAGTCACGCATC , .j. -, I JCCCCCAGTCACGCATC <\ -CTGCTTbTT TCAATC ATCAC ACCCTCTCACG JCCCCCAGTCACGCATC 7CTGCTTG TCAATC ATCAC ACCCTCTCACG ACCCTCTCACG 1 JCCCCCAGTCACGCATC -'!CTQCTTI CTGCTTG TCAATC ATCAC ACCCTCTCACG . T r ' A | ATTCCCAGTCATGCA rCACGC A'!'' c TCAATC ACCCTCTCACG ? c  Human Chimpanzee Gorilla Orangutan Gibbon  34 0 jCACCCACTTAGAGTTGTGAGCCCTTAAAAGt ICACCCACTTAGAGTTGTGAGCCCTTAAAAGf ICACCCACTTAGAGTTGTGAGCCCTTAAAAGt iCACCCACTTAGAGTTGTGAGCCCTTAAAAGl 5CACC r ACTTAGAGTTGTC, AGCCCTTAAAAGI  Human Chimpanzee Gori11 a Orangutan Gibbon  p r c AT T ^ccc AT ! 'I ' I . ccc VI  1k  ::::::  _ _ A A T A AA CCCCTTCCTT A ATA AA CCCCTTCCTT \ A T A A A CCCCTTCCTT ;"c: • P» ATA AA CCCCTTCCTT i •\ATAAArrrrTTcrT7' .']'  B  1  \ACTO 1 ACTCC 1 A•\ACT \ APT C  (  c  T  3 00 300 299 306 316  T  460  iGCTCC tfiOi iCTACA ;CTACA JGCTCC iCTACA JGCTCC ! CTACA •,r,nrr iCTACA JGCTCC  : : : : :  4 56 460 459 470 4 74  Figure 2.4 Multiple sequence alignment oiAPOCl L T R of different primates. Comparison of the solitary H E R V - E L T R associated with the APOC1 gene in different species. Nucleotides identical in all sequences are shown in black while those with at least 80% identity are highlighted in grey.  57  Human Gor i 1 l a Ch i m p a n z e e Orangutan Gibbon Baboon  20 CCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC ACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC CCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC CCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC ACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC  ACCACCCCTCATATTOTCTTATOCCCAATTTCTBfirTr  160  GC C A C A C C CTGGGLIC TGGT AGT TA A AGCATC CCTGACCTAACCGGTT GCCACACCCTGGGJJCTGGTAGTTAAAGATC CCCTGACCTAACCGGTT GCCACACCCTGGGHCTGGTAGTTAAAGATC CCCTGACCTAACCGGTT GCCACACCCTGGG  Human Gorilla Ch imp a n z e e Orangutan Gibbon Baboon  Human Gorilla Ch imp an z ee Orangutan Gibbon Baboon  Human Gorilla Chimpanzee Orangutan Gibbon Baboon  Human Gorilla Ch imp an z ee Or a n g u t an Gibbon Baboon  GCCACACCCTGGG GCCACACCCTGGG  CTGGTAGTTAAAGATC CTGGTAGTTAAAGATC CTGGTAGTTAAAGATC  CCCTGACCTAACCGGTT CCCTGACCTAACCGGNT CCCTGACCTAACCGGTT  T A GA TT C C A G A C A T T G T A T G G A AA A G C | | C TGT G A A AA TT C C T G T C C T G T T C T G T T C TA G A T T C C A G A C A T T G T A T G G A AA AG C K j C T G T G A AA A T T C C T G T C C T G T T C T G T T C T A G A T T C CA G A C A T T G T A T G GAAAAG CKCTGTGAAAATT C C T G T C C T G T T C T G T T C TAGATTCCAGACATTGTATGGAAAAGCJCTGTGAAAATTCCTGTCCTGTTCTGTTC T AG A T T C C AGAC A T T G T A T G G A A A AGCjlc TGTGA A A A T T C C T G T C C T G T T C T G T T C TAGATTCCAGACATTGTATC,GAAAAGC|CTGTGAAAATTCCTGTCCTGTTCTGTTC  GTS C ATHCAGCCBC CAGT GTS CATHCAGCCHCCAGT GTJ CATHCAGCCHCCAGT GTSCATHCAGCCHCCAGT GTS CATJJCAGCCHCCAGT :',T3"ATBCAC,CCS('CAG T  280 CTGCTTGCTCAA CTGCTTGCTCAA CTGCTTGCTCAA CTGCTTGCTCAA CTGCTTGCTCAA CTGCTTGCTCAA  3 2 0  HTGTG ET G T1 T I'GTI jRr'GTi ITGTG fcrroTO  320 320 320 320 316 320  34 0  C CBTTAAAAGGGA CCJJTTAAAAGGGA CCBTTAAAAGGGA C CHTTAAAAGGGA CCBTTAAAAGGGA CCBTTAAAACGGA  Human Gorilla Chimpanzee O r a n g u t an Gibbon Baboon  Figure 2.5 Multiple sequence alignment of EDNRB L T R in various primates. Comparison of the 5' L T R of the H E R V - E element associated with the EDNRB gene in different species. Nucleotides identical in all sequences are shown in black while those with over 80% identity are highlighted in grey.  58  2.3.4 Significance of the HERV-E LTRs in expression of APOC1 and EDNRB To investigate the significance of the L T R in expression regulation of the APOC1 gene, we inserted the native promoter region, which naturally contains the L T R and the native promoter, upstream of a promoterless luciferase reporter plasmid (pGL3B). We also tested the activity of the L T R by itself and the native construct where the L T R was removed. We then performed transient transfections to test the relative levels of promoter activity of the different constructs.  The L T R was also inserted at a distance in constructs with the  APOC1 promoter where the L T R was removed, to test for the possibility that the L T R acts as an enhancer of the native promoter. The expression in liver is completely dependent on a distal H C R (Dang et al. 1995) and we saw little promoter activity of the APOC1 constructs without the presence of this HCR. However the promoter strength of the L T R did not vary dramatically whether or not the HCR was present (results not shown).  The results of the  transfections of HepG2 (liver) cells with a variety of APOC1 constructs are shown in Figure 2.6 and suggest that the L T R by itself is not contributing significantly to the overall expression levels of APOC1 in liver. However, when the L T R is removed from the APOC1 locus, the promoting activity of the region drops about 40% in HepG2, suggesting that the presence of the L T R in the APOC1 locus contributes to the overall activity of the native promoter region. However, we found no evidence that the L T R alone acts as an enhancer in liver cells when positioned at a greater distance from the native promoter.  59  Luc HCR  AP0C1  LTR  Luc HCR  AP0C1  Native Luc  HCR  APOC1  Native  -LTR, Luc HCR  Baboon APOC1 Native Luc  HCR  Baboon APOC1 Native +LTR,  Luc  0  50  100  150  200  250  300 350  Figure 2.6 Effect of the L T R on APOC1 promoter activity in human and baboon. Schematic representation of the promoter constructs, in which the native human APOC1, baboon APOC1 and L T R fragments were inserted upstream of the luciferase (Luc) vector pGL3B. Constructs where the L T R was removed from the human or added to the baboon APOC1 promoter region were used as a comparison with the native constructs. On the right are the results of the luciferase activities obtained from the different constructs after transient transfection in HepG2. The luciferase activities obtained with each plasmid are corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as fold increase over the activity of the promoter-less pGL3B vector, which was assigned a value of 1. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  60  A test of the effect the L T R had at the time of integration in the primate lineage would be to insert the L T R into the APOC1 locus of a species that naturally lacks the LTR. Our analysis showed that all hominoids have the L T R integrated in the APOC1 locus but that it is absent in the baboon. The sequence of APOC1 baboon locus has been determined (Pastorcic et al. 1992), and sequence alignments of the human and baboon loci verified the absence of the L T R in the baboon (not shown).  We inserted the L T R into the baboon  APOC1 locus at the orthologous site, and compared the relative promoting activity between the constructs with and without the LTR. The L T R insertion into the baboon locus resulted in a small increase in expression, similar to that seen in the human locus, suggesting that the L T R had a similar effect when it first integrated in the primate lineage (Figure 2.6).  To investigate the effect of the L T R in EDNRB expression, the native EDNRB promoter region or the L T R alone were inserted upstream of the  luciferase gene of pGL3B.  We also inserted the L T R at a distance, in direct and opposite orientation with respect to the native EDNRB promoter region, to test for potential enhancing effect of the L T R on the EDNRB native promoter region.  The choriocarcinoma cell line JEG-3 was transiently  transfected with these constructs, and the results are shown in Figure 2.7. The activity of the native EDNRB promoter segment alone is low, and it is evident that the native EDNRB promoter is dependent on an enhancer element not present in the constructs or on a factor that is absent in the cell line. However, when the L T R was inserted in either direction at a distance with respect to the native promoter, a significant increase in activity was observed, indicating that the L T R can act as an enhancer of the native promoter region extrachromosomally. When constructs containing only the L T R upstream of the  luciferase  gene were transfected into JEG-3, a very high activity was observed in comparison with the  61  other constructs in JEG-3 or the SV40 promoter control plasmid pGL3p. The high activity of the L T R in JEG-3 and absent activity in HepG2 (not shown) agrees with the RT-PCR results, where the LTR-EDNRB  fusion transcripts were detected only in placenta. As an independent  control of enhancing activity of the LTR, constructs with the L T R upstream of the SV40 promoter (pGL3p) were transfected into JEG-3. Independent of the orientation of the L T R with respect to the SV40 promoter, a 7-10-fold increase in activity was seen relative to constructs with the SV40 promoter alone (data not shown), suggesting that the L T R also enhances the SV40 promoter in placental cells.  62  Figure 2.7 Promoter and enhancer activity of the EDNRB LTR. The native promoter region and the 5' H E R V - E L T R of the human EDNRB gene were inserted upstream of the promoterless luciferase reporter vector pGL3B, as shown on the left. On the right are the results of the luciferase activities obtained from the different constructs after transient transfection in Jeg-3. The luciferase activities obtained with each plasmid are corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as fold increase over the activity of the promoter-less pGL3B vector, which was assigned a value of 1. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  63  2.4 Discussion In this study, we detected and characterized alternative transcripts of the APOC1 and EDNRB genes with H E R V - E sequences at their 5' termini.  Both fusion transcripts are  expressed in a variety of human tissues and were shown by 5' R A C E to initiate downstream of a putative T A T A box within H E R V - E LTRs, demonstrating that the LTRs are alternative promoters for these genes in humans. For APOC1, we found transcripts derived from the L T R promoter in various tissues, including liver. The significance of APOC1 in other tissues is not known, and the general transcription levels are lower than observed in liver. However, the L T R and native promoters appear to be equally active in many of the other tissues tested. In the case of EDNRB, it should be noted that the LTR-EDNRB fusion transcript was first isolated from a placental library by Arai et a/.(Arai et al. 1993) but was considered to be a gene rearrangement or artifact due to the L T R in the 5' UTR, which differed from the originally described U T R region of EDNRB (Nakamuta et al. 1991; Ogawa et al. 1991). R T PCR analysis using cDNAs derived from various tissues, indicated that the chimeric EDNRB form was present only in placenta. However, in our 5' R A C E of placental cDNA, the major form corresponded to transcript sizes derived from the native promoter, demonstrating that this is the most abundant transcript form in placenta. Although the LTRs of the APOC1 and EDNRB locus are 85% identical in sequence, the expression pattern of the two fusion transcripts is different, where activity of the LTREDNRB is restricted to placenta and the APOC1 LTR-derived transcripts are detected in many tissues. It is possible that restrictive expression of the LTR-EDNRB transcript is due to methylation of the H E R V locus in adult tissues. Methylation is a widely used mechanism employed by mammalian cells to restrict the expression of unwanted gene products and  64  retroelements (Yoder et al. 1997; Walsh and Bestor 1999).  The APOC1 L T R may be  protected from methylation and thereby expressed in adult tissue, due to its close proximity to the native APOC1 promoter region. Another explanation for the different transcription pattern is that acquired nucleotide substitutions have specifically destroyed or created transcription factor binding sites in the two LTRs. The nucleotide divergence of the LTRs is probably a direct effect of substitutions acquired after their integration into the genome. We estimated that the L T R integrated into the EDNRB locus about 30-40 million years ago and that the L T R integrated into the APOC1 locus 20-30 million years ago.  It is likely that  H E R V - E elements were actively transposing in the primate lineage during this time period, because the previously characterized H E R V - E element 4-14 and the H E R V - E of the pleiotrophin locus are of similar age (Shih et al. 1991; Schulte and Wellstein 1998). As is the case for many other H E R V families, no recent integrations involving this endogenous family have been observed, indicating that the H E R V - E elements are deeply fixed in the primate lineage. Transient transfections were performed to test the significance of the LTRs in the genomic loci of  APOC1 and EDNRB. The results using the APOC1 constructs suggested  that the L T R alone does not contribute significantly to the overall expression levels of APOC1 in liver.  However, when the L T R was removed from the APOC1 locus, the  promoting activity of the APOC1 locus dropped about 40% in HepG2 cells.  This result  suggests that the presence of the L T R in the APOC1 locus contributes to the overall activity of the native promoter region, perhaps by providing position-dependent cw-acting elements, which work in combination with the native regulatory sequences.  The genes encoding the  three human apolipoproteins E , CI, and CII are located in a 45-kb cluster on chromosome 19  65  (Smit et al. 1988) and encode proteins with the ability to associate with lipids (Li et al. 1988). The different apolipoproteins have distinct roles in lipid metabolism, where APOC1 is implicated to interact with apolipoprotein E in regulating the plasma lipid levels and in prolonging the residence time of lipoprotein particles in the circulation (Jong et al. 1999). Our analysis shows that all hominoids have the L T R integrated in the APOC1 locus, but it is absent in baboon. By introducing the L T R in the baboon APOC1 locus, we observed an increased expression relative to that seen for the natural baboon locus.  At the time of  integration, it is possible that the L T R was tolerated by either its neutral or beneficial effect on individuals. It is obvious that the presence of the L T R would have been selected against if it had a strong impact on APOC1 expression, resulting in hyperlipemic individuals (Li et al. 1988), which has been suggested as a possible explanation for silencing of a second  APOC1 (the APOCF) gene in humans (Freitas et al. 2000). Although both the in vivo and transfection results suggest that the L T R has a moderate positive effect on the expression levels of  APOC1, one possibility is that the L T R replaced an existing function, for example  the silenced second APOC1 gene.  Another possibility is that the L T R had a selective  advantage when it was first acquired, for example in ensuring the export of lipoprotein to peripheral tissues, thereby maintaining important cellular functions during periods of limited food supply.  In contrast to the L T R at the APOC1 locus, a portion of the EDNRB transcripts are derived from the L T R promoter in placenta.  The L T R also increases the activity of the  native EDNRB promoter region in transient transfection experiments, suggesting that this L T R might have a dual role in acting both as promoter and enhancer for the expression of EDNRB in placenta. However, the distance of over 50 kb separating the L T R from the native  66  EDNRB promoter makes it less likely that the L T R acts as en enhancer in the genome.  In  human placenta, endothelins (ETs) are implicated in the fetoplacental circulation via E T B and E T A receptors, and as growth factors of placental cells (Fant et al. 1992; Ffandwerger 1995). The role for ETs and E T receptors in placental development is supported by studies in rats, where an increase in ET and E T B receptor density coincides with a rapid increase in placental growth (Shigematsu et al. 1996), whereas elevated ET concentrations are observed in cases of placental growth retardation (McMahon et al. 1993).  Although the exact  biological consequences of the interactions of ETs and the ET receptors in different parts of the placenta are complex and not well understood, our studies show that the L T R contributes significantly to expression of  EDNRB. While the LTR-induced increase of EDNRB density  in placenta might be an evolutionary event without physiological significance, another possibility is that an increased receptor density would serve as a clearance for the high levels of ETs that are present in the placenta, which in turn have implications in placental development and uteroplacental functions. In summary, we have identified two H E R V - E elements that mediate increased transcription of the EDNRB and APOC1 genes in humans by donation of promoter and enhancer functions from their LTRs and add to the list where LTRs have been co-opted to serve gene regulatory functions.  67  Chapter 3: Functional analysis of the endogenous retroviral promoter of the human endothelin B receptor gene  A manuscript entitled as above by J.-R. Landry and D.Mager is In Press in the Journal of Virology  68  3.1 Introduction Endothelin receptor type B is a G protein-coupled, seven-transmembrane receptor encoded by the EDNRB gene on chromosome 13 (Arai et al. 1993). It is one of the two receptors by which the potent vasoactive effects of endothelins are mediated (Sakurai et al. 1990).  In addition, mutations in EDNRB are likely implicated in the etiology of  Hirschsprung disease (Chakravarti 1996; Carrasquillo et al. 2002), a multigenic congenital disorder characterized by the absence of ganglion cells along a segment of the intestine (OMIM 142623). The EDNRB gene appears to be alternatively spliced as several transcript variants have been reported in the literature (Elshourbagy et al. 1996; Tsutsumi et al. 1999). At least four 5' isoforms have been described for this gene, which are likely derived from a common promoter region as their transcription initiation sites cluster within a 1 kb region. As discussed in chapter 2, we have characterized an additional 5' variant which does not originate from this shared promoter but instead initiates 57.5 kb further upstream from a H E R V - E LTR. This retroviral element contributes to the expression of the EDNRB gene by providing a placenta-specific promoter and possibly an enhancer. We described in chapter 2 that a different H E R V - E L T R alternatively promotes in several tissues a gene involved in lipid metabolism,  APOC1. Although the APOC1 and  EDNRB associated H E R V - E LTRs share a high sequence identity, their promoter activity appeared to vary in both strength and tissue-specificity.  We hypothesized that sequence  differences between the LTRs resulted in the presence of specific transcription factor binding sites which, for the EDNRB  associated  L T R , conferred strong placenta-restricted  transcriptional activity. In this chapter, we confirm the tissue-specificity of both LTRs and dissect the regions in the EDNRB L T R that contribute to high placental expression.  69  3.2 Methods 3.2.1 Reverse transcription and Real-time PCR Total R N A from human adult and fetal tissues was purchased from BD Biosciences Clontech. Following the elimination of remaining genomic D N A with DNAse (Gibco BRL), first-strand cDNA was synthesized as described in chapter 2. Real-time PCR was performed on one-hundredth volume of cDNA using 25 ul of 2X Sybr green PCR master mix (PE Applied Biosystems) and the following amplification conditions: 30 s at 95°C, 30 s at 55°C, and 30 s at 72°C for 35 cycles on a Biorad iCycler. For  APOC1, primers 129 and 130 (located in exons 3 and 4 respectively) were used to  amplify all APOC1 transcripts and primers 127 and 128 (located in the retroviral first exon 1 and exon 3) to amplify only APOC1 transcripts containing H E R V - E sequence. For  EDNRB,  primers 133 and 134 (located in exons 2 and 3 respectively) were used to amplify all EDNRB transcripts and primers 131 and 132 (located in the retroviral first exon, la, and exon 2) to amplify only EDNRB transcripts containing H E R V - E sequence. A l l primer pairs used for real-time PCR are listed in Table 3.1 and were designed in accordance with the PE Applied Biosystems guidelines to ensure that the amplification efficiency of the different primer pairs was close to equal. For that purpose, all primer pairs amplified sequences of similar length (-100  bp).  In addition, validation experiments were conducted to confirm that the  amplification efficiency of the products were very similar. Dissociation curves were run to detect nonspecific amplification and it was determined that single products were amplified in each reaction.  The relative quantification of APOC1 and EDNRB chimeric and total  expression was calculated using the comparative threshold cycle method (PE Biosystems User Bulletin #2, ABI PRISM 7700 Sequence Detection System).  70  Applied  3.2.2 Sequence analysis Pairwise alignment of the APOC1 and EDNRB 5' LTRs was performed by ClustalX version 1.8 (Thompson et al. 1994) and displayed using Genedoc version 2.6 (Nicholas et al. 1997). using  Putative transcription factor binding sites in the retroviral promoters were predicted Alibaba2.1  (http://wwwiti.cs.uni-magdeburg.de/~grabe/alibaba2/)  and  the  Transcription Element Search System (TESS) (http://www.cbil.upenn.edu/tess/index.html).  3.2.3 Plasmid constructions The retroviral promoter constructs were designed by cloning the 5' L T R of the  EDNRB and APOC1 H E R V - E elements into the KpnVBglil of the pGL3 basic luciferase vector (Promega). The 462 bp LTRs were amplified from genomic D N A using primers 16 and 17 followed by a nested PCR reaction using primers 18 and 19, for the EDNRB L T R and primers 13 and 14 for the APOC1 LTR.  The region amplified and cloned for the  EDNRB 5' L T R represents positions 59303 to 58842 of accession number AL139002.5 while the APOC1 5' L T R is present at positions 26537 to 26992 of accession number AF050154.1. All oligonucleotides used in plasmid construction are listed in Table 3.1. Progressive 5' deletion constructs of the retroviral promoter were generated by amplifying the EDNRB L T R with the following primers: oligos 75 and 19 for the fragment from positions 241 to 462 of the LTR; oligos 78 and 19 for positions 191-462; oligos 79 and 19 for positions 131-462; oligos 109 and 19 for positions 122-46; oligos 86 and 19 for positions 111-462; and oligos 80 and 19 for positions 97-462. The resulting L T R sections were then cloned in the Kpnl-Bghl site of pGL3B.  71  For the mutation constructs Mut A , B, C, and D, in vitro mutagenesis was performed by amplifying positions 111 to 462 of the EDNRB L T R using oligo 19 and the following mutating oligos: oligo 108 to generate the mutations T124A, C125A, T126A, G127A and G128A (Mut B); oligo 110 for C118A, C119A and G121A (MutB); oligo 111 for C115A and C117A (Mut C); and oligo 112 for G i l l A, C112A and C113A (Mut D). The mutated L T R fragments, representing position 111-462, were then inserted in the multicloning site of the  luciferase plasmid pGL3B. The hybrid APOC1 and EDNRB L T R constructs were generated by digesting the  LTRs cloned in pGL3B (see above) with a common restriction enzyme that cuts both LTRs once at the same position. The restriction enzymes Pflml (restriction site present at position 118 of the LTRs),  Sfcl (position 168), Sphl (position 247) and Sstl (position 358) were used  to cleave the LTRs in 2 segments.  Following digestion, restriction fragments from both  LTRs were electrophoresed on a 1.5% agarose gel, purified using Qiaex II gel extraction kits (Qiagen) and the 5' fragments of one L T R were ligated to the 3' segments of the other. The resulting hybrid LTRs were then cloned in the Kpnl-BglH site of pGL3B. For the mutation constructs Mut E through K, in vitro mutagenesis was performed by amplifying the 3' part of the EDNRB L T R (position 159-462) using oligo 14 and the following mutating oligos: oligo 146 to generate mutation C177A (Mut E); oligo 147 for G190A (Mut F); oligo 148 for T209C, G213A and C215T (Mut G); oligo 149 for C177A, G190A, T209C, G213A and C215T (Mut H); oligo 145 for T176A, C177A, C178A, G180A (Mut I); oligo 144 for T185A, G186A, T187A, G190A, G191T (Mut J); and oligo 143 for T202A, G203A, T209A, C210A (Mut K). L T R were digested using  Following amplification, the 3' fragments of the  Sfcl and Bgl II and purified using Qiaex PCR purification kits  72  (Qiagen). Non-mutated 5' sections of the  EDNRB L T R were also digested using Kpnl and  Sfcl and purified as above. The mutated 3' parts of the LTRs were then ligated to 5' fragments ofthe  EDNRB L T R and cloned in the Kpnl-BglW site of pGL3B.  3.2.4 Cell culture and transient transfections The human choriocarcinoma Jeg-3 cell line was maintained in RPMI supplemented with 5% fetal calf serum and antibiotics. The human colon cell line DLD-1 was cultured in otMEM supplemented with 10% fetal calf serum and antibiotics.  The human glioma U87,  lung carcinoma A549, liver carcinoma HepG2 and embryonic kidney 293 cell lines were maintained in D M E M supplemented with 10% fetal calf serum and antibiotics. Cells were seeded 24 hours prior to transfection in 6-well plates at a density of 2 x 10  5  cells/well.  Monolayers of U87, A549, Jeg-3 and 293 cells were cotransfected with 1.8 ug of plasmid D N A and 200 ng of the  Renilla luciferase vector pRL-TK using 7 ul of Lipofectamine (Life  Technologies). DLD-1 cells were cotransfected similarly using 4 ul of Lipofectamine (Life Technologies) and 6 ul of Plus Reagent (Life Technologies). HepG2 cells were cotransfected with 1.5 u.g of plasmid D N A and 50 ng of the  luciferase vector pRL-TK using calcium  phosphate (Cellphect) as described by the supplier. A l l cells were washed 24 hours following transfection in phosphate-buffered saline (PBS) and harvested in 500 ul of IX passive lysis buffer (Promega). Firefly and  Renilla luciferase activities were measured using the Dual-  Luciferase Reporter Assay System (Promega).  The data was normalized to the internal  Renilla luciferase control and expressed with respect to pGL3B (basic promoterless vector).  73  Table 3.1 Oligonucleotides used for APOC1 and EDNRB constructs and real-time study Oligo  Sequence  Description  13 14 16 17 18 19 75 78 79 80 86 108  cggggtaccTAAGGGAGGAGACCACCCCT gaagatctTGTAGCAGGAGGAGCCGCAG AACATCCTCTGTCTCTCC CCAGTTCCTTCCCAGTGT cggggtaccTAAGGGAGGATACCACC gaagatctTGTAGCAGGACAAGCTGC cggggtaccGTGCATGCAGCCCCCAGT cggggtaccGAAAAGCACTGTGAAAATTCC cggggtaccGTTAAAGATCGACCCCTGAC cggggtaccGCAGACAGTCTGGTGCCA cggggtaccGCCACACCCTGGGTCTGGTAG cggggtaccGCCACACCCTGGGAAAAATAGTTA AAGATCGACCCCTG cggggtaccGGTCTGGTAGTTAAAGATC cggggtaccGCCACACAATAGGTCTGGTAGTT cggggtaccGCCAAAACCTGGGTCTGGTA cggggtaccAAAACACCCTGGGTCTGGTA TCTGAGGACCCCACAGAGT GATCGACAGAACCACCACC GGTGGTGGTTCTGTCGATC CAGTGTGTTTCCAAACTCCTT GGGAAGGAACTGGTACTTGG ACTTGGAGGCGGCTGCATG GACCTGCTGCACATCGTCAT CAGCTTACACATCTCAGCTCC GGTTATGTTATCTATAGATTCCAGACATTGT ATGGAAAAGCACTGAAAAAATAACTGTCCT G GGTTATGTTATCTATAGATTCCAGACATAAA ATATAAAAGCACTGTG GGTTATGTTATCTATAGATAAAAAACATTGT ATGGAAA GGTTATGTTATCTATAGATTACAGACATTG  APOC1 APOC1 EDNRB EDNRB EDNRB EDNRB EDNRB EDNRB EDNRB EDNRB EDNRB EDNRB (A) EDNRB EDNRB EDNRB EDNRB  109 110 111 112 127 128 129 130 131 132 133 134 143  144 145 146 147 148  149  GGTTATGTTATCTATAGATTCCAGACATTGT ATAGAAAAGCACTG GGTTATGTTATCTATAGATTCCAGACATTGT ATGGAAAAGCACTGTGAAAATCCCTATTCT GTTCTGTTC GGTTATGTTATCTATAGATTACAGACATTGT ATAGAAAAGCACTGTGAAAATCCCTATTCT GTTCTGTTC  74  UTRKpnl UTR Bglil L T R flanking L T R flanking L T R Kpnl L T R Bglil L T R Del 1 L T R Del 2 L T R Del 3 L T R Del 4 UTR Del 3.5 L T R Mut 4 L T R Del 3.2 L T R Mut5 (B) L T R Mut6 (C) L T R Mut7 (D)  APOC1 Ex IS APOC1 APOC1 APOC1 EDNRB EDNRB EDNRB EDNRB EDNRB  Ex3A Ex3S Ex4A ExlS Ex2A Ex2S Ex3A L T R Mut 8(K)  EDNRB L T R Mut 9(J) EDNRB L T R Mut 10(1) EDNRB UTRMut 12(E) EDNRB L T R Mut 13(F) EDNRB L T R Mut 14(G) JEDMtf L T R M u t 15(H)  3.3 Results 3.3.1 Expression pattern and contribution oiAPOCl and EDNRB chimeric transcripts To further characterize the abundance and tissue-specificity of the chimeric mRNAs, we performed real-time PCR on cDNAs from various tissues. The hybrid transcript forms were amplified using primers specific to the retroviral isoforms while total APOC1 and EDNRB transcripts were amplified using primers located in invariant exons found in all transcript forms. We determined the contribution of the L T R in driving the transcription of the APOC1 and EDNRB genes by calculating the percentage of overall transcripts that contained the retroviral first exon.  As shown in Figure 3.1,  the relative abundance of  APOC1 was found to be highest in fetal brain where chimeric forms represented nearly 40 % of overall APOC1  transcripts, followed in prevalence by adult brain (cerebellum, brain),  small intestine, spinal cord and mammary glands in which retroviral isoforms constituted more than 10% of APOC1 mRNAs. In addition, a significant level (over 2%) of chimeric transcripts were also detected in several other tissues such as colon and testis. In contrast, the EDNRB retroviral isoform had a very low relative abundance (less than 1%) in all tissues tested with the exception of testis and placenta. While less than 5% of EDNRB mRNAs were chimeric in testis, 15% possessed a retroviral first exon in placenta. The placenta specific abundance of the EDNRB retroviral isoforms compared to the relative high levels of chimeric APOC1 transcripts in several tissues suggested a disparity in tissue-specificity strength for the two H E R V - E LTRs.  75  and promoter  50  454 40  35 4  +->  a.  •  EDNRB  •  APOC1  "530 </>  c 2 25  re *20 15 10  C  g  =  0) -o  f\ ,fi c  *i X >r ffl  1 8« <g B I 1 3 &  C  5  10 01  re 10  ® oI I I 8 ra-  o CL  Q.  OT  ,n  re 10 c  u  ,n ,n  3  O to  , n  a §  2 »  a,  C  i_  Ia  c o m  2  >  di  >  TO c  ±  c 0  ™ o  •-, -3J — a  -i  re  0)  co  «°  3  S  a CO  Figure 3.1 Proportion of APOC1 and EDNRB transcripts contributed by the L T R s . Total cDNA from various human tissues were subjected to real-time PCR using primers that amplified either all APOC1 and EDNRB transcripts or only those with a retroviral first exon (chimeric). The relative abundance of chimeric transcripts, expressed as the percentage of total cDNAs from each respective gene containing a retroviral 5 ' UTR, is depicted by bars +\- SD. Values are plotted for tissues in which more than 1% of APOC1 (grey bars) or EDNRB (black bars) mRNAs are chimeric.  76  3.3.2 Transcriptional activity of the retroviral promoters Results from past experiments in which the APOC1 and EDNRB L T R reporter constructs had been tested in a liver and placenta cell line, respectively, indicated that the APOC1 L T R was a weak promoter while the EDNRB L T R had very strong transcriptional activity in the cell line tested (see chapter 2). Because of the new knowledge regarding the tissues in which the relative abundance of the chimeric transcripts was high, we decided to further analyze the promoter activity of the retroviral LTRs. We transiently transfected the same luciferase plasmids in which the LTRs had been inserted upstream of the reporter gene (see chapter 2) into various cell lines including those derived from tissues where the levels of retroviral isoforms were elevated. As shown in Figure 3.2, the Apo L T R construct had weak promoter activity in colon, brain, lung, liver, kidney and placental cell lines.  The  transcriptional activity of the APOC1 L T R taken out of its genomic context does not mimic the strength and tissue-specificity observed  in vivo by real-time PCR as high levels of  chimeric transcripts were found in brain and colon. This suggests that although the APOC1 chimeric transcripts are transcribed by the H E R V - E LTR, additional enhancing sequences not present within the retroviral promoter are required for expression.  77  Constructs  Relative Luciferase  Basic Luc  0  D L D - l (colon)  HD U87 (brain)  APOC1 1-462  B  A549(lung)  •  HepG2 (liver)  •  293 (kidney)  1  Jeg3 (placenta)  EDNRB 1-462  50  100  150  200  250  300  350  Figure 3.2 Promoter activity of the APOC1 and EDNRB L T R s . Representation of the retroviral promoter constructs in which the APOC1 or EDNRB associated LTRs were inserted upstream of the promoter-less pGL3B vector and transiently transfected into D L D - l (crossed bars), U87 (hatched bars), A549 (checkered bars), HepG2 (grey bars), 293 (white bars) and Jeg-3 (black bars) cell lines. The basic pGL3B vector and the SV40 promoter pGL3p plasmid were also transfected in the above cell lines. The luciferase activities obtained with each plasmid were corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as fold increase over the activity of the basic (pGL3B) vector, which was assigned a value of 1. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  78  On the other hand, the activity of the EDNRB L T R construct in cell lines follows closely that observed in vivo. As shown in Figure 3.2, the transcriptional activity of the EDNRB L T R is very high in a placental cell line but nearly absent in other cell types. Similarly retroviral EDNRB isoforms were shown to represent a high proportion of total transcripts in placenta. The disparity in promoter activity and tissue-specificity between the two LTRs is striking as the two LTRs belong to the same group of endogenous retroviruses, the H E R V - E family, and are 85 % identical. Figure 3.3 shows a comparison of the sequence of the 5' L T R of the H E R V - E elements in the APOC1 and EDNRB locus. Since the LTRs were similar in sequence but not in transcriptional activity, we decided to dissect the EDNRB retroviral promoter in order to identify regions that conferred high promoter activity in placental cells.  79  EDNRB : AP0C1  :  ACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC^AAGAAAGAARAAGTAAAAACTAAAAGGCA ACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTC^AAGAAAGAASAAGTAAAAACTAAAAGGCA  iMMtl<  LPEl 140  EDNRB AP0C1  GAAATGAAATCCAC GAAATGAAATCCAC  r  EDNRB  :  160  GCCACACCCTGGGNCTGGTGGTTAAAGAT8GACCCCTGACCT GCCACACCCTGGGHCTGGTBGTTAAAGATIIGACCCCTGACCT"  LPE2  1 \  ( 180 * N 200 f *^ 220 240 @T GTTATCTATAGATTfflCAGACATTGTATgGAAAAGCACTGTGAAAATncCTgTgCTGTTgTGTTCllGATCTgATTACC A  AP0C1  GTTATCTATAGATTSCAGACATTGTATSGAAAAGCACTGTGAAAATBCCTBTBCTGTTUTGTTCBGATCTSATTACC  EDNRB APOC1 EDNRB APOC1  EDNRB APOC1  :  0  340 360 380 400 SGBT A G G W A T T G C T C A C T C 0 G 0 G A G C T C 5 G C T C T T G A G A C A G G A S T C T T G C C E A T H C T O C R G S B C G A A AGGSATTGCTCACTC§G"GAGCTCEGCTCTTGAGACAGGABTCTTGCCBAT(ICBJCJGECGAA  TAAACCMCTTCMTT  TAAACCfflCTTCBTT  Figure 3.3  Sequence comparison of the retroviral promoters of the EDNRB and  APOC1 genes. Pairwise alignment of the 5' LTRs of the APOC1 and EDNRB associated H E R V - E elements. The shaded regions represents nucleotides that are identical between the two LTRs. Nucleotide numbering starts from the first position of the L T R and the putative T A T A boxes are indicated. The identified placental enhancers, L P E l and LPE2, are underlined.  80  3.3.3 Mapping of the EDNRB L T R promoter To begin the characterization of the retroviral promoter, we generated a series of 5'  EDNRB L T R deletion luciferase constructs. Transfections of these plasmids were carried out in Jeg-3 cells as the L T R promoter appeared to be particularly well utilized in this cell line. Our deletion analysis, shown in Figure 3.4, demonstrated that a region present between position 111 and 122 of the EDNRB L T R was necessary for high promoter activity. While a construct containing positions 111-462 of the EDNRB L T R resulted in a relative luciferase level of nearly 300, the activity obtained with a similar plasmid containing positions 122-462 was reduced to a level of less than 5. Therefore the removal of the 11 bp from 111-122 resulted in a 60 fold reduction in activity.  In silico analysis of this small region revealed a potential Spl binding site situated between positions 111 and 120.  The importance of the predicted Spl binding site was  confirmed by site-directed mutagenesis (see Figure 3.5).  While mutations of several  nucleotides between position 124 and 130 (Mut A) did not significantly reduce the reporter activity , the mutagenesis of 2 or 3 nucleotides between positions 111 and 122 (Mut B, C, D) severely decreased the promoter activity. Combined, the results obtained by the mutation and deletion analysis indicate that the nucleotide sequence between positions 111 to 122 of the EDNRB L T R is required for strong promoter activity in placental cells.  Although  transcription factor binding site software identified a putative Spl binding site in this motif, the identity of the protein could not be confirmed by electrophoretic mobility shift assay using an Spl antibody. While specific binding of this region with a Jeg -3 nuclear extract was competed by an Spl consensus oligonucleotide, it could be not supershifted by an Spl antibody (results not shown). The functional domain isolated is therefore likely bound by  81  other members of the Spl superfamily. Since the identity ofthe transcription factor was not resolved, the motif present between positions 111-122 of the EDNRB L T R will be referred to as LPE1 for L T R placental enhancer 1.  82  Constructs 1-462  .  Relative Luciferase I  I  I  I  0  50  100  150  I  200  250  300  350  Figure 3.4. Effects of 5' deletions on the transcriptional activity of the EDNRB LTR. Representation of the 5' L T R deletion plasmids transiently transfected in the Jeg-3 choriocarcinoma cell line. A l l constructs contain variable lengths of the LTR. The name of each plasmid indicates the positions of the L T R included. Plasmid 1-462 corresponds to the entire L T R and is identical to the L T R promoter construct of Figure 3.2 Results are illustrated as in Figure 3.2.  83  Ill  462 Relative Luciferase  Mut B MutC Mut D 0  B  50  100  150  200 250  300 350  LPEl Del 111  111-bccacaccctggbtctggta-13 0  Mut A  lll-gccacaccctgggAAAAAta-130  Mut B  lll-gccacacAAtAggtctggta-130  MutC  Hl-gccaAaAcctgggtctggta-130  Mut D  lll-AAAacaccctgggtctggta-130  Figure 3.5 Confirmation of a cw-element between position 111-122 of the EDNRB retroviral promoter. (A) Mutational analysis of position 111-130 of the EDNRB L T R in Jeg-3 cells. All constructs contain position 111-462 of the LTR. While the first plasmid (Del 111) does not have any alterations, the Mut constructs, have been mutated at the positons indicated by the arrows. The motif present between positions 111-122 of the EDNRB LTR is referred to as L P E l for LTR placental enhancer 1. The constructs are not to scale. (B) The sequence of nucleotides 111 to 130 in each construct is indicated and the L P E l motif is boxed. Mutations are shown in bold capital letters.  84  3.3.4 Hybrid construct experiments While the region 111-122 ofthe EDNRB L T R appears to be indispensable for high activity in Jeg-3 cells, this retroviral segment cannot be sufficient for high activity as an identical motif is present in the same location in the APOC1 L T R (see Figure 3.3).  To  further delineate additional regions important for high promoter activity in the EDNRB LTR, 3' deletion experiments could not be performed as the likely T A T A box is present toward the 3' end of the LTR. Instead we resorted to dissecting the important domains required for promoter activity by creating hybrid LTRs using the sequence of the APOC1 and EDNRB LTRs. In the first set of hybrid constructs, as shown in Figure 3.6(A), 3' sections of various lengths of the EDNRB L T R were replaced with the corresponding sequence from the APOC1 LTR. Transient transfection of these constructs in Jeg-3 cells followed by luciferase assays indicated that the second half of the EDNRB LTR, from positions 247 to 462, could be replaced by the APOC1 section without any reduction in transcriptional activity. However, further replacements, such as from 168 to 462, completely abolished the promoter activity, suggesting that another region critical for high transcription was present between positions 168 to 247. The importance of this region was confirmed by complimentary constructs in which 3' segments of the APOC1 L T R were replaced with the appropriate EDNRB sections. As shown in Figure 3.6(B), substituting the 3' half of the APOC1 L T R , from positions 247 to 462, with corresponding EDNRB L T R sequence did not result in any improvement in transcriptional activity while swapping an additional 79 basepairs, from 168 to 247, increased the promoter activity to levels obtained with the full-length EDNRB LTR.  85  Constructs  Relative Luciferase  A  Endo 1-462  100  0  200  300  400  500  600  700  B Apo 1-462 Apo 358 Endo  Apo 247 Endo Apo 168 Endo Apo 118 Endo  100  0  200  300  400  500  600  700  Figure 3.6 Fusion study of the APOC1 and EDNRB L T R s . Schematics of the hybrid APOC1-EDNRB constructs transfected in Jeg-3 cells to delimit the regions necessary for high promoter activity. The fusion constructs were designed using common restriction sites present in both LTRs and are named based on the position at which the L T R section was swapped. For example, the construct Endo 358 Apo contains positions 1-358 of the EDNRB associated LTR, followed by positions 359-456 of the APOC1 LTR. The black sections of the arrows (LTRs) represent sequence from the EDNRB L T R while the grey regions are from the APOC1 LTR.  86  3.3.5 Site-directed mutagenesis of the EDNRB L T R We performed sequence analysis using different transcription factor binding site software to identify possible trans-elements that may bind this second functional retroviral segment of the EDNRB L T R between position 168-247, which we have called LPE2 (LTR placental enhancer 2). Using this approach, three candidate binding sites were identified for which transcription factors had been previously found to be, involved in placental expression; the heterodimer E47/Thingl (Knofler et al. 2002), Oct-1 (Wang and Melmed 1998; Cheng 2001), and NF-kappaB (Wang et al. 2001).  These motifs present within LPE2 will be  referred to as A , B, and C as the identify of the proteins that bound to them was not confirmed.  Interestingly for each of these putative binding sites identified in the EDNRB  L T R between positions 168 to 247, nucleotide differences exist between the APOC1 and EDNRB LTRs; one in motif A , another in B and three in site C (Figures 3.3 and 3.7).  To  investigate a possible role for these binding sites, and to determine whether the sequence variation between the APOC1 and EDNRB LTRs at these sites resulted in the promoter strength disparity, we modified the EDNRB L T R by mutating the positions that differed in the putative transcription factor binding sites between the two, to the sequence present in the APOC1 LTR. Individually, the changes in each motif to mimic the APOC1 L T R sequence only reduced the reporter activity by half (see Figure 3.7(A), Mut E , F, G). However, in combination (Mut H), the EDNRB to APOC1 changes resulted in the ablation of the promoter.  Interestingly, the complete removal of any of the putative transcription factor  binding sites, by mutation of several nucleotides, also significantly reduced the promoter activity (see Figure 3.7(B)). These results suggest that all three binding sites are important and that the proteins which bind to them act in combination. While the transcription factors  87  might have bound to the APOC1 sequence if only one of three sites differed with respect to the EDNRB LTR, the amalgamation of the variant sites likely results in severely reduced binding efficiency which then leads to the decrease in promoter activity.  88  462 Luc  LPE2  175  2l~5  A  B  C  A  B  C  Mut F  A  B  C  MutG  A  B  u+ c  MutH  A  B  Mut I  X  B  Endo MutE  -  Relative Luciferase  nc +  B  Mut J - \  A  Mut K  A  C  X B  y  B  Endo  175-^ttccagacarJtgtatggaaaabcactdtgaaaattcctgtd-215  MutE  175-ttAcagacattgtatggaaaagcactgtgaaaattcctgtc-215  Mut F  175-ttccagacattgtatAgaaaagcactgtgaaaattcctgtc-215  MutG  175-ttccagacattgtatggaaaagcactgtgaaaatCcctAtT-215  MutH  175-ttAcagacattgtatAgaaaagcactgtgaaaatCcctAtT-215  Mut I  17 5 -tAAAaAacattgtatggaaaagcactgtgaaaattcctgtc-215  MutJ  175-ttccagacatAAAatATaaaagcactgtgaaaattcctgtc-215  Mut K  175-ttccagacattgtatggaaaagcactgAAaaaatAActgtc-215  Figure 3.7 Mutational analysis of position 175 to 215 of the EDNRB retroviral promoter. A l l constructs with mutations between 175-215 contain the full length EDNRB L T R and were transfected in Jeg-3 cells. (A) In constructs Mut E , F, G , H , the arrows represent nucleotides that have been replaced so as to have the same sequence as in the APOC1 LTR. (B) In constructs Mut I, J, K , the Xs depict putative transcription factor binding sites that have been removed by 3 to 5 mutations. (C) The sequence of nucleotides 175 to 215 in each construct is indicated and the LPE2 motifs A , B and C are boxed. Mutated nucleotides are shown in bold capital letters.  89  3.4 Discussion Regulatory sequences participating in placental-restricted transcription have been identified for a number of genes (Bi et al. 1997; Wang and Melmed 1998; Yamada et al. 1999; Cheng 2001; Watanabe et al. 2001). In some cases, the proteins that confer tissuespecificity appear to be ubiquitously expressed transcription factors. For example, important motifs in the placenta specific promoter of the human gonodotropin-releasing hormone receptor gene (GnRH) have been found to interact with the common trans factors Oct-1, CRE, G A T A and API (Cheng 2001).  In other reported placental-specific elements, the  transcription factors involved are preferentially expressed in the placenta. A novel placentarestricted trans-factor was shown to bind a placenta-specific element (PSE), with sequence C A T G G C C T G A A C T A G T T T T , in the enhancer of the human leukemia inhibitory factor receptor (LIFR) gene (Wang and Melmed 1998).  Other placental D N A binding proteins  which have been identified are the TSE binding protein that recognizes the core sequence R N C C T N N R G in the trophoblast-specific element (TSE) of the aromatase cytochrome P450 gene (Yamada et al. 1995) and the protein hGCMa that binds to a second element, TSE2 ( C A T A A G A C C C T C A T T C C A G A G G ) , in the human aromatase gene (Yamada et al. 1999). This last tissue-restricted protein, hGCMa, has also been shown to recognize the PLE1 (placental  leptin  enhancer)  element  (CAGTACCCTCAGGCTTACTA  - G G G T G G T G A A A A A C T C ) in the placental promoter of the leptin gene (Yamada et al. 1999) (Bi et al. 1997). placental  transcription  Interestingly, the identified cis-element PLE1 as well as another factor  binding  site,  PLE3  (CCTGGTAAATTTGTGG  T C A G A C C A G T T T T C T G C T C T ) , were shown to reside within a retroviral H E R V - K L T R  90  (Bi et al. 1997).  The protein hGCMa was recently found to activate the expression of  H E R V - W encoded syncytin in placental cells (Yu et al. 2002). Other retroviral sequences have been shown to play important roles in regulating the placental expression of human genes, as was discussed in chapter 1. A H E R V - E element was determined by our group to act as an alternative placental promoter for the  MIDI gene,  which will be discussed in chapter 4. Another member of the H E R V - E family has also been suggested to contribute to the placenta-specific transcription of the human pleiotrophin (PTN) gene (Schulte et al. 1996). A n Spl binding site in the retroviral PTN enhancer was found to be essential for the placental expression of this promoter (Schulte et al. 2000). We now report the isolation of two critical regions in the retroviral promoter of the human EDNRB gene that are necessary for strong placental transcriptional activity. The first identified motif, which we have named L P E l , is present between position 111 and 122 of the LTR.  Transfection experiments of deletion and mutation constructs indicate that the L P E l  region is essential for promoter activity and stimulates transcription by 60 fold. However, L P E l is not sufficient to confer strong placental-specific transcriptional activity as another H E R V - E LTR, associated with APOC1 transcription, contains the L P E l motif but represents a weak promoter in placenta. Interestingly, like the H E R V - E placental enhancer of the PTN gene, L P E l was also predicted to contain an Spl binding site. Results from mobility shift assays suggest that, while »Spl protein does not appear to bind L P E l , Spl-related proteins, likely interact with L P E l . A second positive regulatory element, LPE2, was mapped between positions 168 and 247 of the EDNRB LTR. This region appeared to be adequate for high activity in placental cells as replacing the corresponding segment in the APOC1 L T R with LPE2 increased the  91  promoter strength from nearly zero to levels on par with the EDNRB LTR. The LPE2 was predicted to contain three binding sites, referred to as A , B, and C, for proteins which had been shown to participate in the placental transcription of genes. These were the heterodimer E47/Thingl, the Oct-1, and NF-kappaB transcription factors, respectively.  The putative  Thing 1 binding site, which is also known as Handl, was specially interesting as this protein has been demonstrated to have a tissue-restricted expression pattern in placenta and heart and had been shown to be important in placentation (Riley et al. 1998). A putative site for the Handl/E47 heterodimer was found in the LPE2 region of the EDNRB L T R between positions 173 and 188. Although mutations confirmed the importance of this site, we were unable to confirm binding of Handl or E47 to LPE2 by electrophoretic mobility shift assays as the addition of antibodies to either of these two proteins did not result in supershifts (results not shown).  It is possible that novel proteins might interact with LPE2 to confer  strong placental-restricted expression. The LPE1 and LPE2 regions of the EDNRB L T R appear to be conserved, which supports an important roles for these c/s-elements. A n identical LPE1 motif was present in the EDNRB L T R of gorilla, chimpanzee, orangoutan, gibbon and baboon shown).  (results not  With the exception of chimpanzee which had one nucleotide difference, the  sequence of the LPE2 motif was also the same in the above species across 40 bp (results not shown). To determine whether other H E R V - E retroviruses besides the £ZWi?i?-associated element possessed the LPE1 and LPE2 elements, we analyzed LTRs derived from a H E R V - E phylogenetic study (J.-R. Landry, unpublished). A survey of the 60 H E R V - E LTRs with the highest identity to EDNRB which included the APOC1 LTR, found LPE1 sequences at the same position of 20 retroviral elements.  The first binding site in LPE2, A , was found in 5  92  H E R V - E elements while the second, B, was present in 10 H E R V - E LTRs including 3 of the 5 which also possessed binding site A. Finally, only one H E R V - E L T R contained motif C in the same position as in the EDNRB LTR, but it did not have either of the A or B binding sites.  The L P E l cz's-element therefore appears to be abundant in  H E R V - E retroviral  elements while the complete LPE2 motif could not be identified in any other LTR. In summary, we have confirmed the placental-specificity of the retroviral promoter of the human EDNRB gene and characterized motifs important in its tissue-restricted expression.  We have shown that the identified L P E l and LPE2 regions of the H E R V - E  element are critical for the strong plancental transcriptional activity of the EDNRB LTR. Our results illustrate the complexity and diversity of mechanisms by which endogenous retroviral sequences can contribute to the transcription of human genes.  93  Chapter 4: The Opitz syndrome gene MIDI is transcribed from a human endogenous retroviral promoter  A paper by J.-R. Landry, A. Rouhi, P. Medstrand and D.L. Mager, entitled as above, has been published in 2002 in the journal Molecular Biology and Evolution, 19: 1934-1942.  A. Rouhi contributed to the deletion analysis of the  MIDI retroviral promoter. This work is  summarized in Figure 4.4  94  4.1 Introduction Through the bioinformatic analysis of sequence databases, as described in chapter 2, we discovered a chimeric transcript for a third human gene. The fusion mRNA identified contained the L T R of a H E R V - E element linked to the Opitz syndrome (OS) gene MIDI. Mutations in  MIDI are responsible for the X-linked form of OS (Quaderi et al. 1997;  Gaudenz et al. 1998), a genetic disorder that primarily affects the development of midline structures (Opitz 1987; Robin et al. 1996).  MIDI is a single copy gene that encodes a  microtubule associated protein (Cainarca et al. 1999; Schweiger et al. 1999) which has been reported to associate with (Liu et al. 2001) and target phosphatase 2A for degradation (Trockenbacher et al. 2001). While several studies have focused on the function of the MIDI gene product, transcriptional regulation of the attention.  Although multiple  MIDI gene has received relatively little  MIDI transcript sizes as well as several alternative 5' UTRs  have been described (Quaderi et al. 1997; Perry et al. 1998; Van den Veyver et al. 1998; Cox et al. 2000), the promoter or promoters of  MIDI remained uncharacterized. In this chapter,  we have determined that a H E R V - E element is involved in transcriptional regulation of the human  MIDI gene and have characterized the retroviral element that acts as an alternative  tissue-specific promoter for this gene.  95  4.2 Materials and Methods 4.2.1 Database searches and sequence analysis The human expressed sequence tag (EST) and non-redundant (nr) databases were screened by B L A S T version 2.0 (Altschul et al. 1997) using H E R V - E leader regions as query sequences to identify novel chimeric mRNAs. Repeatmasker  with  RepBase  version  3.04.  Transcripts identified were analysed by (http://ftp.genome.washington.edu/cgi-  bin/RepeatMasker) to assess if they contained non retroviral sequence and potential hybrids were further characterized to determine the nature of the non-repetitive sequence by conducting further B L A S T searches. The analysis for transcription factor binding sites was performed using Alibaba2.1 (wwwiti.cs.uni-magdeburg.de/~grabe/alibaba2/).  4.2.2 Rapid amplification of cDNA ends All oligonucleotides used for R A C E and other procedures are listed in Table 4.1. 5' R A C E was performed as described in chapter 2 unless otherwise noted below.  The first  round of PCR amplification was carried out using oligos 45 and API (the latter, provided by the supplier). The nested round of amplification was performed using oligos 46 and AP2.  96  4.2.3 Reverse transcription and Real-time PCR Total R N A from human adult and fetal tissues was purchased from Clontech and Stratagehe or prepared from different sections of human placenta as previously described (Wilkinson et al. 1990). Following the elimination of remaining genomic D N A with DNAse (Gibco BRL), first-strand cDNA was synthesized as previously described. Real-time PCR was performed as described in chapter 3 using the following oligonucleotides.  Primers 48 and 49 (located in exons 2 and 3 respectively) were used to  amplify all MIDI transcripts; primers 36 and 47 (located in the retroviral first exon, 1R, and exon 2) to amplify only MIDI transcripts containing H E R V - E sequence; and primers 60 and 61 to amplify all G A P D H transcripts. Levels of  MIDI amplification (all 5' U T R isoforms)  were normalized by G A P D H and expressed relative to the level of  MIDI transcripts in fetal  spleen, which was given an arbitrary value of 1 as it contained the least amount of  MIDI  mRNA in the various tissues tested.  4.2.4 Plasmid constructions The retroviral promoter construct was designed by cloning the 5' L T R of the H E R V - E element into the  MIDI  KpnllBgRl of the pGL3 basic luciferase vector (Promega). The  488 bp L T R was amplified from genomic D N A using primers 11 and 12 followed by a nested PCR reaction using primers 84 and 10. For the native promoter construct, a 1.1 kb fragment upstream of the non-retroviral  MIDI first exon present in accession number  Y13667 was amplified using primers 91 and 92 and inserted in the pGL3B.  Kpnl-BglR site of  The promoter region, represented positions -949 to + 184 relative to the  transcription start site of the non retroviral  MIDI 5' UTR. MIDI L T R enhancer plasmids  97  were constructed by amplifying the L T R from human genomic D N A using flanking oligos 11 and 12 and nested primers 101 and 102, then inserting the H E R V - E L T R in either orientation in the BamRX site of the promoter construct. Progressive 5' deletion constructs of the retroviral promoter were generated by amplifying the L T R with the following primers: 85 and 10 for the fragment from positions 263 to 488 ofthe LTR, oligos 98 and 10 for positions 179-488, oligos 106 and 10 for positions 150-488, oligos 105 and 10 for positions 119-488, oligos 99 and 10 for positions 96-488 and oligos 113 and 10 for positions 51-488. The resulting L T R sections were then cloned in the Kpnl-Bglll site of pGL3B. The marmoset  MIDI native promoter plasmid was constructed by amplifying the  region orthologous to the 1.1 kb human  MIDI native promoter from marmoset cell line  genomic D N A using oligos 91 and 92 and cloning the resulting product in the Kpnl-BglU site of pGL3B. The L T R marmoset promoter construct was generated as described above for the L T R human promoter construct using human genomic D N A for the amplification of the L T R as the H E R V - E element is not present in the marmoset  MIDI locus. A l l constructs were  sequenced to confirm orientation and sequence integrity.  4.2.5 Cell culture and transient transfections HepG2, Jeg-3 and 293 cells were cultured and transfected as described in chapter 3.  98  4.2.6 Genomic typing for the  MIDI H E R V - E L T R  Genomic D N A was prepared from marmoset (New World Monkey), baboon, gibbon, orangutan, gorilla, chimpanzee and human cell lines as previously described (Goodchild et al. 1993). Primers 12 (situated upstream of the H E R V - E element) and 70 (situated in the 5' LTR) were used in the PCR amplification to detect the presence or absence of the H E R V - E element in the  MIDI locus of different primates. Following amplification, the obtained PCR  products were hybridized to the human 5'  MIDI H E R V - E L T R to confirm the identity of the  amplicons.  99  Table 4.1 Primers used for MIDI genomic and cDNA amplifications Oligo  Sequence  10 11 12 36 45 46 47 48 49 60 61 70 84 85 91 92 98 99 101 102 105 106 113  gaagatctTGTAGCAAGACAAGCCGCAG CCCAAGACCAGTCCTGTGAAG CTCACCCTCAGAAGACTACTTG GTGGAGGATCAACGCAGTG CAGTTCTGACTCCAGTGTTTCCATC CAGGCAAAGCTCTCTTGTGTCATC AACCCAAGGAAGCTGATCAG GGTGGCAGCTTTGAGTGAG TGGATGAGTTTAGCCAAAAGG GCCCAGGATGCCCTTGA GTGTCCCCACTGCCAAC AAGCCGCAGACAAAACTCCTC cggggtaccTGAGAGAAGAGAGACAGACC cggggtaccACCAGTGCATGCAGCCCCT cggggtaccCTCTTTGCTCAACTTGCACT gaagatctGAAACACCGAACCCGACA . cggggtaccACCCCTGACCTAGCAACTGA cggggtaccCTAAGAACCAGACGCGAAAC cgggatccTGAGAGAAGAGAGACAGA cgggatccTGTAGCAAGACAAGCCGCAG cggggtaccGGAACCAGACCTGAAACCA cggggtaccCCTGACCTAAGCCTGGTAG cggggtaccCAGAAAAGGAAAGAGAAGCAAAA  100  Description MIDI LTR Bglil MIDI L T R flanking-1 MIDI L T R flanking-2 MIDI Exld-2 MIDI Ex2-3 MIDI Ex2-2 MIDI Ex2-1 MIDI Ex2-4 MIDI Ex3 MIDI Gapdh-1 MIDI Gapdh-2 MIDI L T R middle MIDI LTR Kpnl MIDI LTR-Del 1 MIDI PromC Kpnl MIDI PromC Bglil MIDI LTR-Del 2 MIDI LTR-Del 3 MIDI LTRitamHl-1 MIDI LTRBamHl-2 MIDI LTR-Del 2.6 MIDI LTR-Del 2.3 MIDI LTR-Del 4  4.3 Results 4.3.1 Identification and characterization ofthe MIDI chimeric transcript The present study was initiated to identify endogenous retroviruses that participate in the regulation of human genes by providing promoter elements. As transcripts derived from L T R promoters contain retroviral sequence at their 5' termini, we searched the human EST and nr databases for hybrid (viral-cellular) mRNAs.  These searches revealed a chimeric  transcript in accession number AF041208 that had H E R V - E LTR-leader sequence at its 5' end but continued into the sequence of the second exon of MIDI gene. It should be noted that a similar chimeric transcript was isolated from a fetal kidney cDNA library by (Van den Veyver et al. 1998) who reported the presence of an L T R repeat in the 5' UTR of one their clones but did not characterize the origin or structure of the hybrid mRNA.  Database  analysis of the MIDI genomic region indicated that the H E R V - E element involved in the creation of the hybrid mRNA is approximately 6 kb in length and resides 16 kb upstream of the second exon of MIDI (see Figure 4.1).  Generation of the chimeric transcript resulted  from a splicing event between the natural splice donor site in the H E R V - E element and the splice acceptor site at the beginning of the second exon oi MIDI. As a result, the chimeric mRNA generated was nearly identical to a previously reported (Quaderi et al. 1997) nonchimeric MIDI transcript (in accession Y13667) with the exception of the first exon. Because the translation initiation site (ATG) of MIDI occurs in exon 2, the chimeric transcript likely encodes a normal MIDI protein while possessing an alternative 5' UTR. We  confirmed the existence and transcriptional initiation site of this chimeric  transcript by performing 5' R A C E on placental cDNA.  Sequence analysis of 19 R A C E  clones led to the identification of 7 hybrid clones as well as 12 non-retro viral clones  101  A NATIVE PROMOTER  TATA  ATG  SD  1  i  ^/6kb/^~T-  B -475  .. . gcaccaccacccagacttcccaagctcttctcttggtctctgctatagctacaatagcaactttattctcctacaac  -397  tgagagaagagagacagaccctctcatattgttgtatattgttttatactcagaaaaggaaagagaagcaaaactaaaggc ^  -316  aggtagcccggcgcctaagaaccagacgcgaaaccaaggaaccagacctgaaaccaggcctgggcctgcctgacctaagcc \  -235  tggtagttaaaattccacccctgacctagcaactgatgttatctatagattatagaaagacattgtaaaacttcccggtct  -154 -72 + 10  i  \ \  \  i  gttctgtttcactctaaccaccagtgcatgcagcccctgtcacgtaccccctgcttgctcaatcgatcacaaccctcttacg ' tggaccccccttagagttgtgagcccttaaaagggacaggaattgctcactcggggagctcggctcttgaga  CAGGAGTCT ,  TGCTGATGCCTCTGGCCAAATAAACCCCTTTCTTCTTTATCTCGGTGTCTGAGGAGTTTTGTCTGCGGCTTGTCTTGCTACAi  + 92  TTTCTTGGTTCCCTGACCAGGAAGCGAGGTGATTAACAGACGGTTGAGGCAGCTCCTTAGGTGGCTTTAGCCTGCCCTGTGG  + 174  AACATCCCTGCGGGGGACTCCAACCAGCCAGAGCGACGCGGATCCTGAGAGCGCTCCCGGGTAGGCATTTGCCCAGGTGGGA  + 256  CGCCTCGCCAGAGCCGTGTGTGGCAGGCCCCCGTGGAGGATCAACGCAGTGGCTGAACACTGGGAAGGAACTGGCACTTGGA  +338  GTCCAGACATCTAAAACTTGI  +420  cggcatgcctttatcggcactttggttttggttttgacttggtttgaattgcttgacggaactggtcttgggaacttgc  gtaagactagtctttggaacttgcccactccatttgtgtggaagcgtggtctgatcaccca t  Figure 4.1 Characterization of human retroviral-Af/Z)i chimeric transcripts. (A) The position of the H E R V - E retroviral element, depicted as a rectangle flanked by two arrows, is shown with respect to the second exon of MIDI, represented by a box. The location ofthe putative T A T A box is indicated in the L T R (arrow) and the splice donor site is indicated by SD. The H E R V - E 5' L T R is situated over 20 kb upstream ofthe second exon of MIDI. The native promoter is represented by a circle and is located 30 kb upstream of the retroviral element. The translation initiation site is shown at the beginning of exon 2. Two alternative MIDI cDNA isoforms are schematically illustrated below the diagram of the genomic locus. The retroviral first exon is denoted as 1R and IN indicates the native first exon previously reported (Quaderi et al. 1997). Both alternative first exons splice to the second exon using a common splice acceptor site at the beginning of exon 2. (B) Genomic sequence of the H E R V - E element in the MIDI locus (sequence represents position 20480 to 19510 position in accession AC079314.26). The 5' L T R is enclosed by a dashed pointed rectangle and the putative T A T A box is shown in bold. The retroviral first exon sequence (based on the 3 longest 5' R A C E clones) is shown in uppercase and framed.  102  containing heterogeneous sequences, supporting the existence of alternative first exons for MIDI. Moreover, the 3 longest of the 7 hybrid clones all started 34 nucleotides further 5' than the original EST clone AF014108. As shown in Figure 4.IB, the 3 longest 5' R A C E clones initiated 40 bp downstream of the putative retroviral T A T A box in the LTR. These results strongly suggest that the  MIDI mRNA isoform is transcribed from a retroviral  promoter.  4.3.2 Tissue specificity of the chimeric MIDI transcript The 5' R A C E results as well as database searches suggested the presence of heterogeneous first exons for retroviral LTR.  MIDI as well as alternative promoters in addition to the  To investigate the expression pattern of the chimeric transcript and to  determine the contribution of the retroviral promoter in driving expression of  MIDI  transcripts, we performed real-time quantitative RT-PCR on cDNA from placenta and various fetal tissues, as preliminary data suggested the level of chimeric transcripts to be highest in those tissues. Amplifications were carried out using a chimeric transcript specific primer pair (primers 36 and 47) to detect MIDI mRNAs derived from the retroviral element. Additional PCR with primers from a region of  MIDI common to all isoforms (primers 48  and 49) and from G A P D H were also used to assess the total level of  MIDI transcripts  originating from all promoters as well as the quality of the RNA. The real-time RT-PCR data revealed the L T R promoter to be tissue-specific. constitute a significant proportion of total  Chimeric mRNAs were found to  MIDI transcripts in placenta and fetal kidney,  where they represented 25% and 22% of overall MIDI mRNAs respectively (see Figure 4.2). Further analysis on placental sections indicated high L T R promoter activity in chorion,  103  Figure 4.2 Comparison of overall MIDI and chimeric MIDI expression levels. The relative abundance of MIDI chimeric transcripts compared to overall MIDI mRNA as measured by real-time PCR is shown. Total cDNAs from various human tissues were subjected to PCR using different sets of primers to detect either all MIDI transcripts, chimeric MIDI or G A P D H mRNA. Total MIDI levels relative to G A P D H are depicted by bars. The black portion represents the percentage (written above the bars) of overall MIDI mRNAs that possess retroviral first exons. The relative abundance of MIDI isoforms in fetal tissues and placenta is illustrated in panel (A) and the levels in placenta sections in panel (B).  104  decidua and villi where 24%, 16% and 38% respectively of MIDI  transcripts contained the  retroviral 5' UTR variant (Figure 4.2).  4.3.3  Functional analysis of the retroviral promoter In order to establish that the putative MIDI  retroviral promoter was functional, a  reporter construct containing the LTR was transfected into human choriocarcinoma cell line Jeg-3, embryonic kidney cell line 293 and liver cell line HepG2. Concordant with the realtime chimeric expression results, the data obtained with the transient transfections suggest that the MIDI  H E R V - E LTR is a powerful promoter in placenta and embryonic kidney (see  Figure 4.3).  Indeed, in the placental cell line Jeg-3, the activity of the LTR was 5 times  stronger than the SV40 promoter while in the embryonic kidney cell line, the LTR  was  nearly as strong as SV40. However, in the liver cell line, the LTR had much weaker activity, suggesting tissue specificity in the LTR promoter. We  generated additional luciferase  reporter constructs to examine whether the  retroviral sequence could also provide functional enhancing activity to an adjacent alternative MIDI  promoter. As no functional studies have been performed to date on MIDI  promoters, we cloned a 1.1 kb region upstream of the non-retroviral MIDI  first exon present  in accession number Y13667 (the cloned sequence represents nucleotides 50 444 -51 575 in accession number AC079314.26). We tested the promoter activity of this region in the above three cell lines and then determined whether the presence of the LTR upstream of this MIDI promoter influenced expression level of the reporter gene.  As shown in Figure 4.3, this  region was shown to have strong promoter activity in 293 and weaker activity in Jeg-3 and HepG2. Interestingly, these experiments suggested that the native promoter activity was  105  Figure 4.3 Promoter and enhancer activity of the retroviral LTR. The left part of the figure shows the various constructs used to transiently transfect the HepG2 human liver cell line (white bars), the 293 human embryonic kidney cell line (black bars) and the Jeg-3 human choriocarcinoma cell line (grey bars). The luciferase activities are corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as percentages of the activity of the pGL3p vector. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  106  somewhat enhanced in Jeg-3 cells when the L T R was inserted upstream of the native promoter in either orientation. In addition, we also determined (results not shown) that the L T R enhanced activity of the SV40 promoter when inserted in either orientation.  These  experiments suggest that the L T R plays a functional cellular role in both placenta and embryonic kidney by acting as an alternative promoter and possibly also by enhancing the activity of the non-retroviral promoter.  4.3.4 Deletion study of the retroviral promoter To further characterize the retroviral promoter, we generated a series of 5' deletion constructs of the LTR.  Because the L T R appears to be well utilized in placenta, we  conducted our transfections in the placenta cell line, Jeg-3.  As shown in Figure 4.4,  successive deletions from the 5' end of the L T R resulted in a continuous decrease in promoter activity suggesting the presence of several positive regulatory elements upstream of the T A T A box, which is located at position 352 of the LTR.  A four fold difference in  promoter activity was observed between construct 51-488 and 96-488, suggesting a role for the region between position 51 and 96 of the L T R (see Figure 4.1) in conferring high promoter activity. Examination of this section of the L T R revealed the presence of several transcription factor binding sites including G A T A 1 , members of the ETS-type family, interferon response factors, C/EBP, AP2 and Spl. However, no consensus binding sites were identified in that region for placenta and/or embryonic kidney specific transcription factors suggesting that another region of the L T R confers tissue specificity to the L T R while positions 51 to 96 enable high promoter activity.  107  100  200  300  400  500  600  Figure 4.4 Deletion analysis of the retroviral L T R promoter. Representation of the 5' L T R deletion plasmids transiently transfected to the Jeg-3 cell line. Numerical designation of each construct is based on assigning the first nucleotide of the L T R a position of 1. The plasmid 1-488 corresponds to the full length L T R and is identical to the L T R promoter construct of Figure 4. The luciferase activities shown are corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as percentages of the activity of the pGL3p vector. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  108  4.3.5 Analysis of the integration time and potential impact of the retroviral element We performed PCR on genomic D N A from various primates to determine the evolutionary age of the MIDI H E R V - E element. Using a sense primer located upstream of the retroviral integration site in humans as well as an antisense primer present within the 5' L T R of the H E R V - E element, we confirmed the presence of the retroviral element in human, chimpanzee, gorilla, orangutan, gibbon and baboon but were unable to amplify the element in marmoset (a New World Monkey) (see Figure 4.5).  We believe that the absence of  amplified retroviral product in marmoset indicates the lack of the H E R V - E element at that location and not the result of rearrangement in that region as we were able to amplify nearby marmoset sequence upstream as well as downstream of the orthologous retroviral integration site.  Since New  World Monkeys diverged from Old World Monkeys (baboon)  approximately 30-40 million years ago (Sibley and Ahlquist 1987), our results suggest that the H E R V - E in the MIDI locus integrated over 30 million years ago. Comparison of the 5' and 3' LTRs of the MIDI H E R V - E element revealed that they are 93% identical. If one assumes a mutation rate of 0.15-0.21%) per bp per million years (Li and Tanimura 1987; Tristem 2000), this level of L T R identity roughly agrees with the integration time obtained from the genomic PCR studies. To try to assess the possible regulatory impact of the retroviral element at the time of integration, we decided to measure the effect of the H E R V - E L T R on the non-retroviral MIDI promoter of a species which lacks the H E R V - E L T R in the MIDI locus. We therefore generated a MIDI luciferase construct containing the marmoset MIDI region which is orthologous to the human 1.1 kb native promoter tested above. The marmoset region was amplified using the human primers 12 and 70 as the marmoset region had not been  109  sequenced. Upon sequencing of the marmoset segment, it was found to be very similar to the human region (90% identity).  We then inserted the human L T R upstream of the marmoset  segment in the reporter construct to determine if the L T R would affect marmoset promoter activity. The transfection data, as shown in Figure 4.5 (B), suggests that the marmoset MIDI native promoter has slightly stronger promoter activity than the human native promoter in Jeg-3 cells. In addition, the insertion of the human L T R upstream of the native marmoset promoter enhances its activity by five fold suggesting that the L T R can also act as a transcriptional enhancer of the MIDI promoter in other species.  110  e  9  c •6 M O  Eij S  M  human  o. S  "3 I 00 o  chimp  S  J=  gorilla  \  orangutan gibbon baboon marmoset  45  40  35  30  25  20  1  1  1  1  1  —I  15 L_  J  10  MYr  B Human  100  200  300  Figure 4.5 Evolutionary analysis of the H E R V - E element. (A) Top left, An ethidium bromide-stained gel of MIDI H E R V - E L T R specific products obtained from primate genomic DNA. Bottom left, Hybridization to confirm the authenticity of the amplified products. Note that the PCR product from marmoset D N A is non-specific because it does not hybridize. In both pictures, the arrows represent the expected sizes. Right, Dendogram illustrating the divergence time point of various primates. The arrow shows the proposed integration time of the H E R V - E element in the MIDI locus. (B) Schematic representation of plasmids used to transiently transfect the Jeg-3 human choriocarcinoma cell line (grey bars). The luciferase activities shown are corrected for transfection efficiency with the Renilla luciferase pRL-TK plasmid and are presented as percentages of the activity of the pGL3p vector. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/SD.  Ill  4.4 Discussion In the present study we have identified and characterized a hybrid transcript for the MIDI gene. This chimeric mRNA originates from a retroviral L T R belonging to the H E R V E family and is spliced to the coding sequence of the MIDI gene. We have established that the L T R is used as an alternative promoter and that differential utilization of the retroviral promoter generates a MIDI mRNA isoform with a variant first exon. Although the fusion transcript possesses a different 5' UTR, it would encode a midline 1 protein identical to the sequence deposited in Genbank (Y13667 and AF035360) by other groups (Quaderi et al. 1997; Perry etal. 1998). In this report, we have provided the first description of a functional MIDI promoter. We have demonstrated that a H E R V - E L T R transcribes MIDI in a tissue specific fashion as high levels of chimeric transcripts were identified in placenta and embryonic kidney but not in other tissues.  The evidence for this tissue-specificity was strengthened by transient  transfections of a luciferase construct containing the MIDI H E R V - E L T R in different cell lines. Consistent with the in vivo real-time PCR findings, the promoter activity of the L T R varied significantly depending on cell type but was found to be highest in placenta and embryonic kidney cell lines. In addition, the L T R was also shown to increase the activity of the non-retroviral MIDI promoter in a placental cell line, suggesting a dual role for the MIDI H E R V - E L T R as both an alternative promoter and enhancer. Deletion experiments of the retroviral promoter indicated the presence of a positive regulatory region, necessary for high placental promoter activity between positions 51 and 96 of the LTR. Although a number of potential transcription factor binding sites were predicted within this region, including possible sites for G A T A 1 , members ofthe ETS-type family, interferon response factors,  112  C/EBP, AP2 and Spl, additional studies will be required to identify the transactivating element that enables the retroviral promoter to participate in the transcriptional regulation of MIDI.  It is noteworthy that the LPE1 and LPE2 motifs, found to be important for the  placental expression of the EDNRB L T R as discussed is chapter3, are not present in the  MIDI L T R . It is likely that the MIDI associated H E R V - E contributes to placental expression by utilizing different cis- and rraws-elements. The utilization of a retroviral L T R by MIDI raises interesting questions regarding the functional significance of this alternative promoter.  The use of multiple promoters and  transcriptional start sites is believed to be an important evolutionary mechanism that provides flexibility in the regulatory control of gene expression (Ayoubi and Van De Ven 1996). Alternative promoter usage has been shown to be responsible for the tissue-specific and/or developmental stage-specific expression of a number of genes (Chretien et al. 1988; Borgese et al. 1993; Teerink et al. 1994; Anusaksathien et al. 2001).  In several cases, the  alternatively transcribed genes studied contained both a tissue-specific ubiquitous promoter (Ayoubi and Van De Ven 1996).  as well as a  The human porphobilinogen  deaminase gene, for example, was found to be transcribed from both a housekeeping promoter as well as an alternative erythroid specific promoter (Chretien et al. 1988). The expression of  MIDI is likely controlled by a similar transcriptional mechanism and this topic  is further explored in chapter 5. While our data indicates that the chimeric MIDI isoform is transcribed in a tissue-specific fashion, non-retroviral  MIDI transcripts have been reported  by others to be ubiquitously expressed (Quaderi et al. 1997; Perry et al. 1998). Combined, these results suggest that  MIDI is transcribed in all tissues from one or multiple non-  retroviral promoters while the retroviral L T R is responsible for the tissue-specific expression  113  in placenta and embryonic kidney (see chapter 5). It is as yet unclear how or if LTR-driven expression of  MIDI in these two tissues contributes to function of the gene or to the  pathology of Opitz syndrome. No specific abnormalities in kidney or placental tissues have been described for this genetic syndrome, which is characterized primarily by defects in midline structures  such  as  cleft  lip, heart problems,  genito-urinary  laryngotracheal defects and mental retardation (Robin et al. 1996). responsible for a significant fraction of  abnormalities,  Since the L T R is  MIDI transcripts in certain tissues, it is possible that  MIDI function could be affected by activity of the LTR, particularly if small changes in the level of this protein are phenotypically important.  However, the potential effects of  moderate fluctuations in MIDI levels have not been investigated. The presence of a functional  MIDI promoter within a retroviral L T R is intriguing as  H E R V elements are found only in primate species. The insertion of the H E R V - E element upstream of the  MIDI gene, 30 to 40 million years ago as suggested by our study, may have  contributed to the evolution of transcriptional regulatory elements controlling the expression of  MIDI in apes and Old World Monkeys but not in New World Monkeys or other  mammals. Integration of this additional promoter could have resulted in increased overall expression of  MIDI in placenta and embryonic kidney. This hypothesis is supported by our  observation that insertion of the retroviral L T R upstream of the marmoset native promoter region, which does not naturally harbour the  MIDI  MIDI H E R V - E element, results in  enhanced promoter activity in reporter gene assays. It is possible that the higher levels of  MIDI transcripts might have resulted in a phenotype as the midline 1 protein appears to act in a dose dependent fashion.  For example, carrier females for Opitz syndrome who are  heterozygous for MIDI manifest some of the symptoms of the recessive disease (Quaderi et  114  al. 1997; Brooks et al. 1998). Although it is difficult to assess the biological significance of increased  MIDI levels, one can assume that integration of the H E R V - E was not detrimental  to the species or the element would not have been fixed in the population. In summary, the studies reported here demonstrate that a H E R V - E L T R acts as an alternative tissue-specific promoter for the Opitz syndrome gene  MIDI and support a  biological role for some retroelements in the transcriptional regulation of human genes.  115  Chapter 5: Widely spaced alternative promoters, conserved between human and rodents, control expression of the Opitz syndrome gene MIDI  A paper of the same title by J.-R. Landry and D.L Mager has been published in 2002 in the journal Genomics 80: 499-508.  116  5.1 Introduction  MIDI mRNA appears to be transcribed from multiple promoters as several alternative 5' UTRs, including an endogenous retroviral first exon as described in chapter 4, have been identified (Quaderi et al. 1997; Perry et al. 1998; Van den Veyver et al. 1998; Cox et al. 2000). While the expression  of MIDI has been reported to be ubiquitous in both fetal  and adult tissues (Quaderi et al. 1997; Perry et al. 1998), the transcription pattern of the different mRNA isoforms is unknown. In addition, the promoter regions involved in the generation of the alternative first exons have not been isolated.  To gain insight into the  transcriptional regulation of MIDI, we have characterized the alternative mRNA isoforms and promoter regions of  MIDI in humans and other species. In this chapter, we report that  alternative promoter usage results in the production of five  MIDI transcript isoforms with  different expression patterns in humans. We also provide evidence that the alternative first exons and promoters are highly conserved in mammalian evolution.  117  5.2 Materials and Methods 5.2.1 Database searches and sequence analysis Alternative  MIDI transcripts were identified by searching the EST and nr databases  of Genbank using  MIDI coding sequence as query. To obtain the genomic sequence of  MIDI, the high throughput genomic sequence (htgs) database was screened using the nucleotide sequences of the alternative  MIDI transcripts AF269101, AF041206, AF041207,  AF041208, AF041209, AF35360 and Y3667 as queries. The resulting genomic fragments present in Genbank Accession Number U96409.1, AC002349.1, AC079314.26, AC004469.8 and AC008008.2 were assembled using "BLAST 2 sequences" (Tatiana and Madden 1999) to form a contig. The genomic sequence of the mouse MmX_WIFeb01_348.  Midi gene was found in the cohtig  Multiple sequence alignments of the alternative first exons and  promoter regions were performed using ClustalX version 1.8 (Thompson et al. 1994) and Genedoc version 2.6 (Nicholas et al. 1997). Putative transcription factor binding sites were predicted using Alibaba2.1 (http://wwwiti.cs.uni-magdeburg.de/~grabe/alibaba2/).  5.2.2 Rapid amplification of cDNA ends All oligonucleotides used in this study are listed in Table 5.1.  5' R A C E was  performed on a placenta cDNA library as described in chapter 4. 5' R A C E was also carried out using mouse and rat total adult splenic and mouse full-term placenta RNA.  The R N A  was extracted using Trizol (Gibco BRL) and reversed transcribed as previously described (Medstrand et al. 1992). The cDNA obtained was poly(A)-tailed using terminal transferase (Gibco BRL) and amplified using an oligo-dT primer, Q (Zhang and Frohman 1997). The T  second round of PCR amplification was performed using oligos 45 and Qo (the sequence of  118  which is nested in oligo QT (Zhang and Frohman 1997)). A third round of amplification was done using the nested primers 46 and Qj (Zhang and Frohman 1997). The resulting R A C E products were cloned into the PGEM-T vector (Promega) and hybridized using oligo 47 as a probe. Positive clones (19 human placenta, 5 mouse placenta, 11 mouse spleen and 4 rat spleen) were sequenced using the vector primers T7 and SP6.  5.2.3 Dot blot hybridizations A  SearchLights™  LifeTechnologies  and  Human  hybridized  Multiple  R N A Dot  sequentially  corresponding to the alternative first exons of  with  Blot 32  was  P-labelled  purchased  from  oligonucleotides  MIDI (Exla: oligo 71, Exlb: 72, Exlc: 73 and  Exle: 74) to characterize the tissue specificity of the alternative promoters. To ensure that the probes designed were unique and the hybridization specific, the oligonucleotides used were compared to the Genbank htgs database using Blast. The dot blot was hybridized in ExpressHyb (Clontech) at 58°C (Exla), 48°C (Exlb), 52°C (Exlc), 48°C (Exle) and 50°C (Ex2) for two hours and washed twice for 20 min in 2X SSC, % SDS at RT followed by one wash for 20 min in 0.1X SSC, 0.1% SDS. As an internal control for the amount of RNA, the dot blot was also hybridized with a G A P D H P-labelled oligonucleotide (oligo 28) at 50°C 32  as described above.  119  5.2.4 Real-time P C R Total R N A from human adult and fetal tissues was purchased from Clontech and Stratagene.  Total fat R N A was extracted from normal breast adipose tissue following  reduction surgery using Trizol as described by the supplier (Gibco BRL).  Following the  elimination of remaining genomic D N A with DNAse (Gibco BRL), first-strand cDNA was synthesized as previously described. Real-time PCR was performed using the protocol listed in chapter 3 and the oligonucleotides described in chapter 4.  5.2.5 Plasmid constructions  MIDI alternative promoter constructs were designed by subcloning the 5' flanking regions of exons lc, Id and le, in the  Kpnl-BgUl site ofthe promoter-less luciferase plasmid  pGL3B (Promega). The promoter regions were amplified from human genomic D N A using oligos 91 and 92 (positions -949 to +184 relative to the transcription start site of exon IC) for the Promoter C construct, flanking oligos 11-12 followed by oligos 10 and 84 (positions -397 to +91 relative to the transcription start site of exon ID, which represents the full LTR) for the Promoter D (LTR) construct, and flanking oligos 89-90 followed by oligos 96 and 97 (positions -980 to +42 relative to the transcription start site of exon IE) for the promoter E construct. A l l constructs were sequenced to confirm orientation and sequence integrity using the pGL3p vector primers RV3, RV4 or GL2.  120  5.2.6 Cell culture and transfection conditions The murine 3T3-L1 preadipocytes were maintained in Dulbecco's minimal essential media (DMEM) supplemented with 10% fetal calf serum and antibiotics.  Differentiation  was induced by incubating 2 day post-confluent cells in D M E M supplemented with 10% fetal bovine  serum and a hormonal mixture composed  of 0.5  m M 3-isobutyl-l-  methylxanthine (Sigma), l u M dexamethasone (Sigma) and 10 u.g/ml insulin.  After 48  hours, the supplemented media was removed and replaced with D M E M containing 10% serum. Other cell lines were cultured as described in chapter 3. Transient transfections of 3T3-L1 cells were performed 96 hours after the addition of differentiating media using Lipofectamine Plus (Life Technologies  Inc).  Typically,  differentiated 3T3-L1 cells were cotransfected with 10 ul of lipofectamine, 6 ul of Plus Reagent, 2 ng of plasmid D N A and 10 ng of the  Renilla luciferase vector p R L - C M V  according to the manufacturer's instructions. Transfection of other cell lines were performed as previously described.  5.2.7 Isolation ofthe IC and IE genomic regions in other species Genomic D N A was prepared from pig, dog, cow and a marmoset cell line as previously described (Goodchild et al. 1993; Mager et al. 2001).  The putative promoter  region of exon IC was amplified using oligos 91-92 and the sequence 5' of exon IE using flanking oligos 89-90 followed by 96-97. The amplified products were cloned in the p G E M T vector (Promega) and sequenced using vector primers SP6 and T7.  The genomic  sequences for the mouse and rat promoter C and E regions were compiled from searches of  121  mouse and rat shot gun reads in Ensembl using SSAHA (http://www.sanger.ac.uk/Software/analysis/SSAHA/). derived from accessions M12C-c69e05.qlc,  The mouse promoter C region was  G10P6105312fg4.to,  G10P668956RD9.to,  G10P669627FD5.to, G10P622093FAll.to and the rat promoter C from 19866903216055, 19866906363667, 19866917802031.  The mouse promoter E region was compiled using  Kyj26d06.bl, G10P604641RF5.to, G10P697760RG6.to and the rat promoter E region from 19866901038635 and TuwdplE247717.  122  Table 5.1 Oligonucleotides used for  Number 10 11 12 28 37 45 46 47 48 49 60 61 71 72 73 74 84 89 90 91 92 96 97 QT  Qo Qi  MIDI 5' R A C E and other procedures  Sequence gaagatctTGTAGCAAGACAAGCCGCAG CCCAAGACCAGTCCTGTGAAG CTCACCCTCAGAAGACTACTTG GTTGCTGTAGCCAAATTCGTTGTC GGTGATTAACAGACGGTTGAG CAGTTCTGACTCCAGTGTTTCCATC CAGGCAAAGCTCTCTTGTGTCATC AACCCAAGGAAGCTGATCAG GGTGGCAGCTTTGAGTGAG TGGATGAGTTTAGCCAAAAGG GCCCAGGATGCCCTTGA GTGTCCCCACTGCCAAC GTAGGTCCTTCAAGCAGGCTGGGCTGG CACAGTGGAGACAAGCAGATGGCAG CCTCATGTTTAGAATCCCATGACCCATC CCCACGAAATGATAGACAGGCCACC cggggtaccTGAGAGAAGAGAGACAGACC cggggtaccGGTGGAGTCCAAATGTCTG gaagatctGTAGGCAGAGGATGGGG cggggtaccCTCTTTGCTCAACTTGCACT gaagatctGAAACACCGAACCCGACA cggggtaccTTGTCTTGCATGGTGATAAG gaagatctGGTGGGTGGCAGGCAGT CCAGTGAGCAGAGTGACGAGGACTCGAGCTC AAGCTTTTTTTTTTTTTTTTT CCAGTGAGCAGAGTGACG GAGGACTCGAGCTCAAGC  123  Purpose/Location  MIDI LTR Bglil MIDI L T R flanking-1 MIDI L T R flanking-2 MIDI Gapdh-3 MIDI Exld-1 MIDI Ex2-3 MIDI Ex2-2  MIDI Ex2-1 MIDI Ex2-4  MIDI Ex3 MIDI Gapdh-1 MIDI Gapdh-2 MIDI E x l a MIDI Exlb MIDI Exlc MIDI Exle  MIDI LTR Kpnl MIDI PromE flank-1 MIDI PromE flank-2 MIDI PromC Kpnl MIDI PromC Bglil MIDI PromE Kpnl MIDI PromE Bglil 5' R A C E 5' R A C E 5'RACE  5.3 Results 5.3.1 Characterization ofthe genomic structure ofthe alternative first exons of MIDI At the time this study was initiated, the numerous MIDI 5' UTRs isoforms reported in the literature (Quaderi et al. 1997; Perry et al. 1998; Van den Veyver et al. .1998; Cox et al. 2000) and our characterization of a retroviral first exon for MIDI (see chapter 4) suggested the presence of several alternative promoters. To understand the origin of these alternative transcripts and to identify possible elements involved in their transcriptional regulation, we determined the relative genomic locations of the alternative first exons by searching Genbank using different  MIDI mRNA isoforms as query sequences. Comparison of the transcript  variants to the genomic D N A suggested the existence of five different first exons for MIDI. These alternative 5' UTRs as well as the coding exons 2 and 3 of  MIDI were found to reside  on a variety of unassembled sequence fragments in Genbank accession numbers U96409.1, AC002349.1 and AC079314.26 while the coding exons 4 through 10 were present in a contig in accession AC008008.2.  Assembly of the  MIDI genomic D N A revealed that the  alternative first exons of MIDI, which we have named exons la to le, were dispersed over 250 kb on Xp22 (see Figure 5.1), increasing the total length ofthe  MIDI gene to 385 kb.  The most upstream ofthe 5' UTRs, exon la, was found to reside at least 265 kb from exon 2 while the closest, exon le, was present within 10 kb of the second exon (see Figure 5.1). It should be noted that exons lc and Id (retroviral) were respectively referred to as exons In and lr in chapter 4.  124  A ATG  lb -156 kb  -57 kb  -30 kb  -13 kb  ~9kb  B  lb lc Id  2  Figure 5.1 Genomic organization of alternative MIDI first exons. (A) Gene structure of the 5' region of human MIDI, in which the exons are numbered from the 5' end and depicted as boxes. Distances between the five alternative noncoding first exons, which are labelled la to le, are indicated in kilobases. Exon lc was previously referred to as exon In while exon Id was named exon lr in chapter 4. The arrow represents the retroviral L T R promoter of exon Id. (B) Schematic representation of the alternative human MIDI cDNA isoforms. Use of the alternative first exons in the MIDI transcript does not alter the midline 1 protein structure as the different 5' UTRs splice into a common second exon which contains the initiation codon.  125  5.3.2 Identification of the 5' ends of human MIDI isoforms To confirm the variety of alternative 5' UTRs, we performed 5' R A C E on human placental cDNA and sequenced 19 of the clones obtained.  Characterization of the R A C E  products revealed the presence of four different first exons and confirmed the existence of a retroviral 5' UTR. Four of the 5' R A C E clones analysed had UTRs similar to the Mid transcript of accession number Y13667 (exon lc), seven possessed retroviral 5' UTRs like the chimeric mRNA of AF041208 (exon Id), while six of the sequences resembled AF035360 (exon le) (see Figure 5.2). Interestingly, two of the sequenced clones contained a variant of the 5' U T R present in accession AU140398 (exon la) as a result of the utilization of an upstream splice donor site. While possessing high identity (-99%) to the reported 5' UTRs, the R A C E clones obtained for the four alternative first exons la, lc, Id and le were significantly longer than the sequences deposited in Genbank suggesting that the previously characterized  MIDI transcripts were not full length.  Representative R A C E sequences  corresponding to the four alternative 5' UTRs are shown in Figure 5.2. Sequence analysis of additional  MIDI 5' variants present in the databases supported  the existence of a fifth alternative first exon for the accession AF041206 (see Figure 5.2).  MIDI gene (exon lb), resembling  A sixth alternative 5' U T R found in accession  AF269101 and consisting of two first exons (exons la and lc) spliced together might also exist but the presence of unknown sequence at the 5' end of the cDNA matching nothing in the Genbank databases undermines the validity of this transcript type.  Although further  isoforms have been reported, examination of these transcripts revealed the presence of contaminating vector sequence or probable chromosomal rearrangement or cloning artifact (in the case of accession AF041207 where the first 500 bp of the cDNA sequence has high  126  caacttataaaatgttacacttttttcccctacaatgccttcagataatatgaggctggtcacagcccatgatataaaat i  GCCAGGATAAAGATGATTTTAATTAAATTGC ACAAATCCCTTTGTACTGCAACCAAACCCCTACGCAGGCTGATCAACTCTGAGCCATTCTGGTGTGTATAGAGTTAACCA GGTTACCAGGATTCCTAGTTAACTTCTTTCACCAGGAATGCAATTCATGTTTACCTATATAAGGAAACCTGTTAGGCTGA CCTGAGTTTGGCGGATTTTTGCGCTGACTGAGAGGCGTGTAAGCACACACCCTGGAGAGGGGGAGTGATTCCGAGCTGGA CAGAGCAAGCTTCCTGAACGCCCC^GCCATTCAGCTGATCCGAGGCAAAGAGCCGGQACTTGCCCCAGCCCAGCCTGCTT GAAGGACCTACAGI GTTTGTCTCTTCCAGATCAGAACTGAGGAACAAAAACCCCCATCCTGGGAAAAATGGGGAAGCTG actaactgttgactcacagtcacactacaagggccaattttacagact!  Exon 1A variant  Exon 1A  g t g a g t c t t a c c a c t c t c t c t c t g c t t g c t t c t g c t a a g t g g c / 156 kb / t g t c a c a g a t t g c a t t g t c t c t t g g gaaactcatccatcaaaccgtcacctctgattctatgcagggtcactatgaaagaggcagcctgccccagtttgggtatt  CTGAAGTGTTAGTATGTAGTGGAGCTGGTGGTGTGAAGCTGATTCATTTTGTTTCAAAGCCCCAAATGGCATGCTGCCGA ACACAGGCACAAGCACAGCAGCGTCTCCTTAGAAATAATGACTCCAAGGCAAACAGCCCTCATTGCAAAATGTAGAAGAC gtaagtcttcttttta TGCCATCTGCTTGTCTCCACTGTGGTTTATTCGTTTCAGGGTTTTAAATGACGTAATAAAAAG aaaactggaggggtaagtgggaattccag  Exon IB  / 57 kb / a g c c t t t c t g g c t c c c a g g a c c c a c c c c c t t g c t t c t g a t  tggcgaagctcctccgcggcccaagccagccacctacaacgtctctgagaactgatttgaaataagagccaggcggtcct  CGCTTCCCTGCGACCGACTTTTCATAGGCTGGGGGAGAAAAGGTTGAGAAACTTGACATTGTCTCGAGCAGAGTGCGTGT AGCAACAGATCAAAGAAAAAGAGGACGAAAACGTGCTCTTTGCTGCCCGTAGATTTCGCCGGGTTGCTTTTGTCTTGCGG GGCTCCTGTCGGGTTCGGTGTTTCCGCTCTGAAGACTGCGACGCGGGCTCCGATGCAGCTCGCTCCCTGCCGGATGGGTC gtaagctcaggcggttgcgctg / 30 kb / g c t t g c t c a a t c g a t c a c a a c c ATCGGGATTCTAAACATGAGGCAG  Exon IC  ctcttacgtggaccccccttagagttgtgagcccttaaaagggacaggaattgctcactcggggagctcggctcttgaga  CAGGAGTCTTGCTGATGCCTCTGGCCAAATAAACCCCTTTCTTCTTTATCTCGGTGTCTGAGGAGTTTTGTCTGCGGCTT GTCTTGCTACATTTCTTGGTTCCCTGACCAGGAAGCGAGGTGATTAACAGACGGTTGAGGCAGCTCCTTAGGTGGCTTTA GCCTGCCCTGTGGAACATCCCTGCGGGGGACTCCAACCAGCCAGAGCGACGCGGATCCTGAGAGCGCTCCCGGGTAGGCA TTTGCCCAGGTGGGACGCCTCGCCAGAGCCGTGTGTGGCAGGCCCCCGTGGAGGATCAACGCAGTGGCTGAACACTGGGA g t a a g a c t a g t c t t t g g a a c t t g c / 13 kb / gcaaccc AGGAACTGGCACTTGGAGTCCAGACATCTAAAACTTG  Exon ID (retroviral)  tggcggagtggatgaggtggagctggccttcctggaattgttggaccgattttcctgttgtgtttcactgttgacgcact  CACAGACACACAGGACCGCTCCAGCACTGCCTGCCACCCACCGTCTGGTCTCGGTGGCCTGTCTATCATTTCGTGGGTCC gtgggtattcaaggtgatctcttt CCATCCTCTGCCTACGGCGATGTTTCTTCAAAAAGAACTAGTGTGCAGTCCATTG gtcagaatctagagaggtgctcaaagaa ttacactcttgtttccag  Exon IE  / 9 kb / t g a c a c c c c c c t a t g g t t g a c c t c c c t g t g c c t a a t c a a a t c  ATAGCTGATCAGCTTCCTTGGGTTTTGCTGATGACACAAGAGAGCTTTGCCTGAAGATGGA  AACACTGGAGTCAGAACTGACCTGCCCTATTTGTCTGGAGCTCTTTGAGGACCCTCTTCTACTGCCCTGCGCACACAGCC  Exon 2  TCTGCTTCAACTGCGCCCACCGCATCCTAGTATCACACTGTGCCACCAACGAGTCTGTGGAGTCCATCACCGCCTTCCAG  Figure 5.2  Nucleotide sequence of the heterogeneous human MIDI 5' UTRs. The  alternative first exons of MIDI and the 5' sequence of exon 2 are enclosed in boxes and separated by gaps of known length. Exon sequence is in uppercase and coding sequence is underlined in exon 2 while introns and the 5' flanking region are in lowercase. Putative T A T A boxes are shown in bold and splice donor splices are underlined. Each line of sequence contains approximately 80 nucleotides. Two variants forms exist for exon la because of the presence of two different splice donor sites. The nucleotide sequences of representative R A C E clones have been submitted to Genbank with accession numbers AY112900, AY112901, AY112902 and AY112903.  127  identity to genomic D N A from chromosome 19 but no similarity to sequence from the X chromosome).  5.3.3 Tissue distribution of alternative MIDI 5' UTRs in human To evaluate the relative abundance of transcripts arising from promoters A , B, C and E, antisense oligonucleotides were designed from the alternative 5' UTRs of MIDI  and used  to hybridize a commercially prepared multi-tissue R N A dot blot. As shown in Figure 5.3(B), the results of the hybridizations indicate that the alternative MIDI  promoters differ in their  strength and tissue specificity. The data suggests that promoter A is utilized predominantly in adipose tissue. The other non-retroviral promoters, on the other hand, appear to initiate  MIDI  transcripts more ubiquitously, although levels of promoter usage are highly variable.  Due to the repetitive retroviral nature of exon Id, a unique probe could not be designed from this 5' U T R for hybridization purposes.  However, the expression pattern of the chimeric  transcript was previously examined using real-time PCR (chapter 4). In brief, the real-time experiments revealed the retroviral promoter to be tissue-specific as chimeric transcripts were found to only be highly expressed in placenta and at somewhat reduced level in embryonic kidney (see chapter 4).  128  A  1  A B C D E F G 11  3  4  Parotid  Throat  Kidney  Thyroid  Esophagus  Trachea Bronchial Lung (left) Lung (right) Diaphragm  Bladder  Pancreas  Prostate  Adrenal  Testis  Tonsil  Uterus  Thymus  Breast  Spleen  Ovary  Lymph Node Appendix  2  Atrium (left) Atrium (right) Ventricle (left) Ventricle (right) Septum Interventricle Pericardium  Frontal Lobe Temporal Lobe Occipital Lobe Parietal Lobe Thalamus  Small Intestine Colon  Pons  Rectum  Human Gen. DNA Plasmid DNA  Cerebellum  Liver  Spinal Cord  Gallbladder  Stomach  Skeletal Muscle Tongue Adipose Tissue  5  Placenta  6  B 1  2  3  4  5  6  * •• •  •* • • •• • • «  •• • • •  Exon la  Exon lb  Exon lc  Exon le  •• • • « * •* • • • • •••• GADPH  Figure 5.3 Tissue-specificity of MIDI exons l a , l b , lc and le. (A) Tissue key ofthe normal human total R N A dot blot. (B) Hybridization of the multi-tissue blot with isoform specific probes. The exon specific oligos utilized for the hybridization are indicated under each panel. (C) The blot was also hybridized with a G A P D H probe to assess overall levels of RNA.  129  5.3.4 Expression pattern of total MIDI transcripts in human adult tissues To examine the overall expression of MIDI in various tissues, real-time PCR was performed on human fetal and adult cDNAs from various tissues.  Amplifications were  performed using oligos from a region of MIDI common to all isoforms (in exons 2 and 3) to assess the total level oiMIDl transcripts originating from all promoters. Additional PCR with oligos from G A P D H were also done to determine the quality and quantity of the RNA. Our real-time results indicate that as a group, MIDI transcripts (including all 5' isoforms) are widely expressed although levels vary significantly between tissues (see Figure 5.4). MIDI mRNAs were found by real-time PCR to be most abundant in lung, trachea, mammary gland, colon, adipose, placenta and kidney.  Although MIDI transcripts could be detected in all  cDNA sources tested, several tissues had low levels of expression.  Skeletal muscles, bone  marrow, spleen as well as most fetal tissues tested contained significantly reduced levels of MIDI transcripts relative to the levels identified in high expressing tissues. Combined, these results confirm that MIDI is ubiquitously expressed but also demonstrate a high level of variation in expression.  130  200  150-  100 -  50-  S o  g  CD CD  Io  CQ  CO  -a o  O  a.  <D  CQ  CQ  a)  1 u.  CO  t;  X — CD  E co  eo  X3  ra S T3 E o 2 55  ra  5  c JS O  >. *_ ra E E  5 o o  Figure 5.4 MIDI overall expression level as measured by real-time PCR. Total cDNAs from various human tissues were subjected to real-time PCR using primers to detect either total MIDI transcripts or G A P D H mRNAs. The relative abundance of overall MIDI transcripts in each tissue normalized by G A P D H levels is depicted by bars +/- SD.  131  5.3.5 Sequence analysis of the regions upstream of the alternative 5' UTRs To identify putative promoter transcriptional regulatory elements, we analyzed the 5' flanking regions of the alternative first exons of MIDI. These searches revealed that only 2 of the 5 putative promoter regions, A and D, contained possible T A T A boxes (see Figure 5.2).  A consensus T A T A box was present approximately 50 bp upstream of the most 5'  R A C E product identified for exon la while a non-canonical T A T A sequence was found approximately 40 bp upstream of the furthest extent of exon Id. In addition both promoter sequences were found to contain C A A T motifs. Although, promoter regions B, C and E did not possess recognizable T A T A boxes, they harboured some relatively G C rich segments. The overall G + C content of the regulatory sequences present within 150 bp upstream of exons lb, lc and le were of 55%, 61%, and 56% respectively.  132  5.3.6 Promoter activity of the 5' flanking sequence of exons 1C, ID and IE To assess the functional transcriptional activity of predicted MIDI promoter regions, the 5' flanking sequences of exons lc, Id and le were cloned into the luciferase pGL3B plasmid in the sense orientation. The resulting constructs containing the putative promoters C, D and E were transiently transfected into the following cell lines: 1)3T3-L1, a mouse differentiated pre-adipocyte cell line; 2)A549, a human lung carcinoma cell line; 3)HepG2, a human hepatoma cell line; 4)293, a human embryonic kidney cell line; and 5)Jeg-3, a human choriocarcinoma (a substitute for placenta) cell line. As shown in Figure 5.5, the genomic region upstream of exon lc had strong promoter activity in most cell lines (over 20 times the levels obtained with the promoterless vector pGL3B), with the exception of the 3T3-L1 cell line, where the promoter activity was relatively low. Promoter D, the retroviral LTR, was also active in most cell lines. However, results from the 3T3-L1 (adipose-like) and Jeg-3 (placental) cell lines differed between promoters C and D with the latter exhibiting much higher activity in these 2 cell types. In 3T3-L1 cells, reporter activity was 20 fold above background while in Jeg-3 cells, luciferase level obtained with construct D was 400 fold higher than background and nearly 5 times the levels obtained with the SV40 promoter. In contrast to promoters C and D, the level of luciferase activity produced by the construct containing the promoter E region was minimal (between 1.5 and 6.5 fold above the pGL3B background level) in all cell-lines tested suggesting that additional enhancer sequences are required for the expression of exon IE.  133  Figure 5.5 Functional analysis of alternative MIDI promoters. Schematic representation of the promoter constructs used to transiently transfect the 3T3-L1 (crossed bars), A549 (checkered bars), HepG2 (grey bars), 293 (white bars) and Jeg-3 (black bars) cell lines. The promoter constructs C and E contain a ~1 kb region upstream of exon lc and le, respectively while the promoter D construct include the 5' LTR of a H E R V - E element. The luciferase activities obtained with each plasmid are corrected for transfection efficiency with the Renilla luciferase p R L - T K plasmid and are presented as fold increase over the activity of the promoter-less pGL3B vector, which was assigned a value of 1. Each bar is the mean of the relative luciferase activity from at least 2 experiments +/- SD.  134  5.3.7 Isolation and comparison of variant Midi first exons in other mammals Since the human MIDI  gene has several alternative promoters and 5' UTRs, we  decided to investigate whether transcript isoforms for Midi Database searches revealed that alternative Midi mammals, as shown in Figure 5.6. identity to human MIDI  Mouse Midi  existed in other species.  first exons were indeed utilized in different transcripts containing 5' UTRs with high  exons la and le were identified in Genbank accession numbers  Y14848 and AF186460, respectively. In addition, another mouse cDNA present in accession number AF026565 was found to contain a novel 5' UTR, exon If, which had not been isolated in human MIDI transcripts. Finally, a pig Midi EST with similarity to exon lc was found in accession BF444549 as well as a rat Midi transcript with a 5' end corresponding to exon le (see Figure 5.6). Comparison of the mouse 5' UTRs, including the newly identified 5' end, to mouse genomic D N A indicated that the relative order of the shared first exons was conserved between human and mouse and that exon If, which was not isolated in human, resides between exon la and lc. To continue the search for 5' U T R variants in rodents, we performed 5' R A C E on mouse spleen and placenta, and on rat spleen cDNA. clones extended the known 5' heterogeneity of Midi  Analysis of the 16 mouse and 4 rat in both species. As shown in Figure  5.6, two of the rat 5' R A C E sequences had high identity to the mouse exon If while the other two clones did not possess a first exon but instead started at the first nucleotide of exon 2. Eleven of the mouse 5' R A C E clones were similar to the human and pig exon lc while the other five resembled the rat Midi  sequences without first exons. However, unlike the rat  clones, the mouse sequences with no first exons started approximately 80 bp of exon 2 (exon 2 variant).  135  ATG  . . . _ i . _ / / _ i , _ y / . -143 kb  lc  -52 kb  -36 kb  kb  Species  Genb.  RACE  B la  EL  2  Mm  1  0  2  Mm, R  1  0  0 2  Mm, P  0 1  11 0  Ms, R  1 1  0 0  R  0  2  Mm  0  5  lc  Id  v2  Figure 5.6 Alternative Midi transcripts in other species. (A) Structure of rodent and porcine Midi cDNA isoforms. Distances between the heterogeneous 5' ends of Midi in mouse are indicated. (B) The species in which the alternative first exons have been identified are listed on the right. Mus musculus is abbreviated as Mm, Mus spretus as Ms, rat as R and pig as P. The number of Midi sequences in Genbank and 5' R A C E clones containing each first exon are also noted. While no Midi transcripts containing exon lb have been detected in rodents, a substitute alternative 5' UTR, exon If, located between exon l a and lc is used in both mice and rat. In addition, Midi 5' ends with no first exons are also found in rodents.  136  The 5' U T R sequences of MIDI appear to be highly conserved in evolution as multiple sequence alignment of the human alternative first exons with the 5' UTRs of mouse, rat and pig Mid cDNA revealed strong sequence homology (see Figure 5.7).  For example, the  identity between the human and mouse exons la was found to be 79% over the entire length of these non-coding exons. Comparison of the human-murine and human-porcine exons lc showed an average identity of 81% and 79% respectively while the overall sequence identity of both rodent exons le and human exons le was of 85% and 87%. This degree of identity between the 5' non-coding exons of MIDI is striking as it approaches the level of similarity of the coding exons of MIDI from different species. The identity at the nucleotide level between the human and mouse MIDI coding sequences was found to be of 92%, only 5 to 10%) higher then that of the 5' UTRs.  137  A HumanVar  GCCAGGATAAAGATGATTTTAATTAAATTGCACAAATCCCTTTGTACTGCAACCAAACCCCTACGCAGGCTGATC  HumanVar  AACTCTGAGCCATTCTGGTGTGTATAGAGTTAACCAGGTTACCAGGATTCCTAGTTAACTTCTTTCACCAGGAAT  HumanVar  GCAATTCATGTTTACCTATATAAGGAAACCTGTTAGGCTGACCTGAGTTTGGCGGATTTTTGCGCTGACTGAGAG  HumanVar HumanVar Human Mouse  GCGTGTAAGCACACACCCTGGAGAGGGGGAGTGATTCCNAGCTGGACAGAGCAAGCTTCCTGAACGCCCCAGCCA TTCAGc5jGATCcSAgGCA0AG^CgGgC0CTTGCCCCAGCCCAGCCTGCTTGAAGGACCTACAG| BCTTGCCCCAGCCCAGCCTGCTTGAAGGACCTACAGGTTTGBJTCTT CuTSACTTGCCCCEG^CAGCCTBCTTGEAGGACcEACAGGTTTGffiTCTT  HumanVar Human Mouse  B Human Mouse Pig  : |CGCTracCTGCGgCCGACTTTTCATAGGCTGGS : AAGAGCCAGGCGGTCC1 :  ICGCTECCTGCGBCCGACTTTTCATAGGCTGGB  Human Mouse Pig  Human Mouse Pig  TGCCSGTSGATTTGGCCGGGTTGCTTTTGTCTTGCGGGCT^HCTGACGGGTTGGGTGHTTTCCGCTCSGAAGAC TGCCTGTGGATTSGSCCGAGTTGCTTTTGTCTTG3GGGCTT^RCTGCCGGSTTTGGTGBT5T2CGCT2GGAAGSC TGCCTGSGG^T^TGGCJjGAGTTGCTTTTGTCTTGCGC' ' "B|'TGc5GGGTTTGGTG§TTTCcBGTCGGAAGAC rn1 r  r  Human Mouse Pig  c Human Mouse Rat Human Mouse Rat  Human Mouse Rat  GACCGATTTTCCTGTTGTATTTCACTGTCGACnCTCTCACAGACACACACACA GACCGATTTTCCTGTTGTATTTCACTGTCGACKCTCTCACAGACACACACACA  ACACA ACACA  CTGCGTGCCACCCACC!^CTGGTCTCEGTGGCCTGTCTATCATTTWGTGGISBCCI CTGCCTGCCAcScScCACCTGGTCTCAGTGGCCTGTCTATCATTTTljGTGGAAGCUl  CTGCCTGCCACCCACCACCTGGTCTEAGTGGCCTGTCTATCATTTTJGTGGAAGCCI TGTTTCTTCAAAAAGAACTAGTGTGCAGTCC0TTG TGSTTCTTCAAAAAGAACTAGSGTGCAGTCCGTTG TGTTTCTTCAAAAAGAACTAGTGTGCAGTCCGTTG  D Mouse Rat  BCAAGTGGCCGAGTTGCTT0GCCTCCAGCAACTAGGGATTTTCTTCCATC™cgTTGGAGACACTG ScAAGTGGCCGAGTTGCTTgGCCTCCAGCAACTAGGGATTTTCTTCCATcBBcSTTGGAGACACTG  Figure 5.7 Sequence alignment ofthe human, rodent, and porcine first exons of MIDI. Nucleotide sequences of alternative MIDI 5' ends from human, mouse, rat and pig. Sequence of exon la is shown in panel (A), of exon lc in (B), of exon le in (C), and of exon If in panel (D). In panel (A), HumanVar corresponds to the exon la variant isolated by 5' R A C E . The sequence present at the 3' end of all alternative first exons is of the invariant MIDI exon 2. The nucleotide sequences of novel 5' isoforms have been submitted to Genbank with accession numbers AY112904, AY112905 and AY112906.  138  5.3.8  Comparison of MIDI promoter regions C and E between species We sequenced the 5' genomic region upstream of exons lc and le in various species  to determine if the high level of nucleotide conservation present in the first exons extended to the promoter regions and to possibly identify conserved transcriptional regulatory elements. As shown in Figure 5.8, multiple sequence alignment of the human promoters C and E with the orthologous regions of different mammals revealed a striking degree of homology. For promoter C, overall sequence identity along nearly 1000 nucleotides was of 72% between human and dog, 66% for human and rat and 65% between human and mouse (data not shown).  The 1 kb promoter E region shared 73%, 69% and 61% identity with the  orthologous region in dog, pig and mouse respectively.  Within the promoter regions, the  identity between human and the above species increased for sequence closer to the transcription initiation site with identities ranging from 82% to 85% for the 300 bp upstream of exon lc and 74% to 83% for the 300 bp 5' of exon le. Some conserved sequence motifs for transcription factors were also identified in the regulatory region flanking the transcriptional initiation sites of exons lc and le. In promoter C, two putative Spl and two C/EBPct binding sites were detected while several Spl sites were found in the MIDI promoter E of several mammals (see Figure 5.8).  139  A Human Marmoset Dog Mouse Rat  Human Marmoset Dog Mouse Rat  Human Marmoset Dog Mouse Rat  C/EBP Human Marmoset Dog Mouse Rat  Spl  GCCAGCCACCTACSACGTCTCTGAGAACTGATTTGAAATAAGAGCCAGGCGGTCCTCGCTTCCCTGCGGCCGACT GCCAGCCACCTACGACGTCTCTGAGAACTGATTTGAAATAAGAGCCAGGCGGTCCTCGCTTCCCTGCGGCCGACT GCCAGCCACCTACGACGTCTCTGAGAGCGGATTTGAAATAAGAGCCAGGCGGTCCTCGCTTCCCTGSGCCCGACT GCCBGCCHCCTACGACGTCBCTGBGAACTGATTTGAAATAAGAGCCAGGCGGTCCTCGCT^CCTGCGCCCGACT GCCSGCCGCBTACGACGTCBCTGAGAACTGATTTGAABTAAGAGCCAGGCGGTCCTCGCT^CCTGCGCCCGACT  B Human Marmoset Dog Pig Cow Mouse Rat  Human Marmoset Dog Pig Cow Mouse Rat  Human Marmoset Dog Pig Cow Mouse Rat  gGAAA-GCAAGAGGGAGTTTTI GAGGTTTTI 'GAAAAGC C AGAGAGAGTTTTI  CAACCCTGGCGGAGTGGATGAGG CAACCCTGGCGGAGTGGATGAGG CAHCCCTGGCGGSGTGGATGAGG CAGCCCTGGCGGSGTGGATGAGG CAA@CCTGGCGG£GTGGATGAGG CAACCCTGGCGGAGTGGATGAGG CAACCCTGGCGGAGTGGATGAGG  3GACCGATTTTCCTGTTGTGTTTCAC GAC CGATTTTCCTGTTGTGTTTCAC CGI CGATTTTCCTGTTGTGTTTCAC jCGl CGATTTTCCTGTTGTGTTTCAC JCGT CGATTTTCCTGTTGTGTTTCAC 3GACCGATTTTCCTGTTGTGTTTCAC GAC CGATTTTCCTGTTGTSTTTCAC'  mouse, rat  Figure 5.8  Multiple sequence alignment of MIDI promoter regions C and E.  Comparison of the 300 bp promoter sequence upstream of exons lc and le between various species is shown in panels (A) and (B) respectively. Nucleotides identical in over 60% of the promoter sequences are highlighted in black. Arrows represent the transcription initiation site identified by 5' R A C E or present in the Genbank database. The nucleotide sequences of promoter regions C and E in different mammals have been submitted to Genbank with accession numbers AY112907, AY112908, AY112909, AY112910, AY112911 and AY112912.  140  5.4 Discussion In this report, we established that the human, mouse and rat MIDI genes are transcribed from multiple promoter regions. The use of the alterative transcription initiation sites results in MIDI mRNA isoforms with heterogeneous 5' ends but identical coding regions. In human, MIDI expression is likely controlled by at least five promoter regions generating 5 alternative first exons.  The most distal promoter, upstream of exon la, is  situated over 250 kb from the second exon which contains the translation start site. The other human promoters B, C, D and E are separated by distances ranging from 109 to 9 kb from exon 2.  Our results suggest that the majority of the identified MIDI promoters are  ubiquitously active while two, promoters A and D, are regulated in a tissue-specific fashion. Transcript variants containing exons la and Id were found primarily in adipose tissue and placenta respectively.  The proposed expression pattern of the 5' UTRs and associated  promoter regions was confirmed by transfection studies where the activity of promoter D was determined to be much higher in placenta than in other cell lines while the activity of promoters C and E did not vary significantly between cell types. In rodents, transcription of the Midi gene appears to be directed by at least six different promoters, including three of the regulatory regions identified in human.  Midi  transcript isoforms containing 5' ends with high identity (>80%) to human exons la, lc and le were isolated in mouse while a Midi mRNA with a first exon resembling human exon le was present in rat. Two novel 5' UTRs not identified in human were also found in rodents. One novel first exon, If, was shared between mouse and rat, while the other isoform, exon 2 variant, had no first exon sequence but instead started 80 bp upstream of the beginning of exon 2. Analysis of the orthologous region upstream of the common alternative first exons  141  of rodents and several mammals revealed that the high level of homology, found between the 5' UTRs, was also maintained in the transcriptional regulatory sequences suggesting that at least two ofthe heterogeneous promoters were conserved and likely utilized in many species. Alternative promoter usage is believed to be an important evolutionary mechanism that provides flexibility to the transcriptional regulation of genes (reviewed in (Ayoubi and Van De Ven 1996)). Several genes have been reported to possess alternative transcription initiation sites that are used differentially in different tissue types (Bonham et al. 2000; Esterbauer et al. 2000; Munoz-Sanjuan et al. 2000; Anusaksathien et al. 2001). For instance, the human Src (Bonham et al. 2000) and porphobilinogen deaminase (Chretien et al. 1988) genes have been found to be transcribed from both housekeeping and tissue-restricted promoters. Our investigation indicates that the expression of  MIDI is likely to be controlled  by a similar transcriptional mechanism where multiple promoters direct the tissue-specific and ubiquitous transcription of the various mRNA isoforms. Besides contributing a mechanism for the differential expression of genes at the transcriptional level, the use of heterogeneous promoters generates mRNA  isoforms  containing alternative 5' UTRs for which the translational efficiency may vary.  The  utilization of long 5' UTRs with secondary structures (Kozak 1991) or A U G codons upstream of the genuine initiation site (Kozak 1995) can impede the scanning of the ribosome and/or the correct initiation of translation, resulting in decreased translational efficiency.  Sequence analysis of the alternative  MIDI 5' ends revealed that the first exons,  which vary in length between 101 and 364 bp, contain different numbers of initiation codons. The presence of one upstream A U G was detected in exons la, Id and le while six initiation codons were identified in exon lb and three in exons lc and variant la. It is therefore likely  142  that the differentially transcribed  MIDI mRNA isoforms are also translated at different  levels. It is interesting that the alternative first exons and associated promoters, which are proposed to be important for the regulation oi MIDI, are highly conserved between human and other mammals. While the average identity for a large dataset of human and mouse 5' UTRs was calculated to be between 67.5 % (Makalowski et al. 1996) and 75.9 % (Waterston et al. 2002), the nucleotide conservation between human and mouse  MIDI 5' ends was found  to be 79% for exon la, 81% for exon lc and 85% for exon le. In addition to the 5' UTRs, the promoters C and E of dog, rodent and human were also found to share strong sequence homology and had average identities of 79 % to 85 % over 300 bp. Combined, the strong sequence conservation of both the alternative 5' UTRs and upstream regulatory regions of  MIDI suggest the evolution of regulatory mechanisms that are preserved between species. There has likely been selective pressure for  MIDI to acquire and maintain multiple  promoters during evolution, perhaps to prevent serious consequences if one promoter is mutated.  If so, one might predict that pathogenic mutations affecting a single  promoter will not be found.  MIDI  This "safety in numbers" theory could also provide an  explanation for why insertion of the endogenous retrovirus that provides promoter Id was allowed to become fixed in higher primates (see chapter 4). However, it is also possible that some promoters, while ubiquitous in adult tissues, may display spatial- or time-restricted activity in early development. In this case, each promoter may play an important specific role. Further studies will be necessary to address such questions.  143  Chapter 6: Conclusion and General Discussion  144  The overall objective of my thesis was to further understand the role of HERVs in human gene regulation by investigating LTRs that provide alternative promoters to cellular genes. This aim was fulfilled by elucidating the contribution of H E R V - E retroviral elements in the transcription ofthe human APOC1,  EDNRB and MIDI genes. Most importantly, we  were able to establish that LTRs act as alternative promoters for the above mentioned genes and demonstrated that the retroviral promoters of the EDNRB and  MIDI genes contribute  significantly to transcription in placenta. We were also able to determine critical ciselements in the LTRs necessary for strong placental promoter activity. Prior to this project, only three genes, which are listed in Table 1.3, had convincingly been shown to be transcribed from H E R V regulatory sequences. However in each case, the function of these cellular genes as well as the importance of their retroviral promoters were, and still remain, unknown. Therefore the detailed analysis presented in this thesis of three H E R V - E LTRs associated with the expression ofthe well characterized APOC1,  EDNRB and MIDI genes  contributes significantly to our knowledge regarding how retroviral elements can modulate and participate in the transcription of human genes. It is intriguing that H E R V - E elements are repeatedly found involved in gene regulatory functions although these elements are not as numerous as some other H E R V families in the human genome (Mager and Medstrand  In Press) (see Tablel.l). As  previously mentioned, the HERVs that contribute alternative promoters to the  APOC1,  EDNRB, and MIDI genes as well as the elements that have been reported to enhance the expression ofthe amylase (Ting et al. 1992) and pleiotrophin genes (Schulte et al. 1996) all belong to the H E R V - E retroviral family. In addition, we have recently discovered through database searches that a solitary H E R V - E L T R situated on chromosome 16q24 (accession  145  NT_010542) with very high identity (93%) to the  MIDI 5' L T R is also involved in cellular  gene regulation by serving as a polyadenylation signal for the human P-R domain zinc finger protein (PRDM7) mRNA (LocusLinkLD 11105) (unpublished). Several questions regarding the contribution of retroviral regulatory elements in the expression of cellular genes remain to be answered. The characteristics that enable some, but not other similar LTRs, to act as alternative promoters remain to be determined. Harbouring an active promoter is likely not sufficient to control adjacent cellular gene expression. Several LTRs belonging to diverse retroviral families have been shown to possess promoter activity (Feuchter and Mager 1990; Baust et al. 2001; Schon et al. 2001), suggesting that a large pool of functional LTRs are present in the genome . Yet, very few are known to participate in the transcription of human genes.  While further bioinformatic studies will  result in the discovery of additional cellular genes that are influenced or controlled by HERVs, some LTRs with functional promoters are probably not involved in chimeric transcription.  While the presence or absence of nearby genes has been suggested as a  mitigating factor in whether or not HERVs contribute to the generation of chimeric transcripts, our genomic analysis of the  EDNRB and MIDI locus indicate that the LTRs that  participate in the expression of these genes are respectively situated over 50 and 20 kb away. Another possibility to explain the lack of chimeric involvement of some retroviral elements, is that they no longer contain a functional splice donor (SD) site due to deletions or mutations. Alternatively, the retroviral transcripts of others might be more likely to fuse to downstream genes due to the absence of the retroviral splice acceptor (SA) or because sequences modulating the splicing ofthe gene might have been modified. Global analyses of chimeric mRNAs and the genomic sequences from which they are derived could result in  146  interesting leads for these queries. Preliminary bioinformatic results from our lab suggest that 87 annotated transcripts in Refseq begin in an L T R (P. Medstrand, pers. comm.). Another interesting question relates to the placental specific activity of several retroviral elements. Some HERVs are speculated to have played a role in the evolution and divergence of placental mammals (Harris 1998).  More specifically, the expression of  HERV-driven transcripts encoding P T N and syncytin have been implicated in the normal development of placenta as a result of their respective invasive and fusogenic properties (Schulte et al. 1996; M i et al. 2000). It is also possible that the placental transcription of the Cypl9 gene, which was recently determined by our lab to be from an L T R (unpublished data), may be important in placental biology as it results in the production of aromatase and synthesis of estrogen in placenta. In addition, several retroviral promoters including the MIDI- and .EZWioS-associated LTRs are very strong in placenta. It has been hypothesized that different families of LTRs are highly expressed in placenta (see Table 1.1) as it might have been a requirement for the infection of the offspring and subsequent amplification. For the most part, the transcription factors necessary for the placental specificity of the retroviral promoters are still unknown. Some exceptions are the Spl binding sites situated in the 5' L T R of the H E R V - E element associated with PTN (Schulte et al. 2000) and in a leptin placental enhancer (PLE2) which is located within a class II element (Bi et al. 1997). Another identified protein, hGCMa, binds the PLE1 retroviral enhancer of the leptin gene as well as the TSE2 placental specific enhancer of the aromatase gene (Yamada et al. 1999). This same protein was also recently shown to be involved in the expression of H E R V - W encoded syncytin via two binding sites present upstream of the L T R (Yu et al. 2002). Although we characterized regions within the  147  MIDI and EDNRB LTRs that are necessary for strong promoter activity, the identity of the binding proteins that confer placental expression are still unconfirmed and do not appear to be either Spl or hGCMa.  It is possible that different H E R V families, or even subgroups  within families, may have evolved alternative mechanisms to gain placental expression. Finally, the necessity and prevalence of alternative promoters, regardless of their origin, remains a relatively untapped research area.  Although there is now mounting  evidence that numerous mammalian genes possess alternative promoters, their identification and characterization in the majority of genes is largely ignored. We started to address this deficiency for one, gene and determined the utilization of multiple promoters in various tissues as well as in different species for the Opitz syndrome gene, MIDI.  However the  characterization of alternative promoters for the bulk of the human transcriptome awaits further research. In conclusion, it is becoming apparent that the transcriptional control of human genes is very complex.  Mechanisms regulating the tissue-specificity and expression levels of  cellular transcripts, such as the utilization of multiple promoters, are now starting to be elucidated for many genes. Results from my project confirm a role for retroviral elements in the evolution of composite gene regulation and indicate that some LTRs possess a biological function by participating in the expression of human genes.  148  Bibliography Agrawal, P., Q.M. Eastman, and D.G. Schatz. 1998. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature 394: 744751. Altschul, S.F., T.L. Madden, A . A . Schaffer, J. Zhang, Z. Zhang, W. Miller, and D J . Lipman. 1997. Gapped B L A S T and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402. Anusaksathien, O., C. Laplace, X. Li, Y . Ren, L. Peng, S.R. Goldring, and D.L. Galson. 2001. Tissue-specific and ubiquitous promoters direct the expression of alternatively spliced transcripts from the calcitonin receptor gene. J Biol Chem 276: 22663-74. Arai, H., K. Nakao, K. Takaya, K. Hosoda, y. Ogawa, S. Nakanishi, and H. Imura. 1993. The human endothelin-B receptor gene; structural organization and chromosomal assignment. J Biol Chem 268: 3463-3470. Ayoubi, T.A. and W J . Van De Ven. 1996. Regulation of gene expression by alternative promoters. FasebJ10: 453-60. Baban, S., J.D. Freeman, and D.L. Mager. 1996. Transcripts from a novel human K R A B zinc finger gene contain spliced Alu and endogenous retroviral segments. Genomics 33: 463-72. Bailey, J.A., J. Carrel, A. Chakravarti, and E.E. Eichler. 2000. Molecular evidence for a relationship between LfNE-1 elements and X chromosome inactivation: the Lyon repeat hypothesis. Proc Natl Acad Sci USA 97: 6634-6639. Banki, K., D. Halladay, and A . Perl. 1994. Cloning and expression of the human gene for transaldolase. J Biol Chem 269: 2847-2851.  149  Barbulescu, M . , G. Turner, M.I. Seaman, A.S. Deinard, K . K . Kidd, and J. Lenz. 1999. Many human endogenous retrovirus K (HERV-K) proviruses are unique to humans. Current Biol 9: 861-868. Bartolome, C , X. Maside, and B. Charlesworth. 2002. On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol 19: 926-937. Batzer, M.A., P.L. Deininger, U . Hellmann-Blumberg, J. Jurka, D. Labuda, C M . Rubin, C.W. Schmid, E . Zietkiewicz, and E. Zuckerkandl. 1996. Standardized nomenclature for Alu repeats. J Mol Evol 42: 3-6. Baust, C , W. Seifarth, U . Schon, R. Hehlmann, and C. Leib-Mosch. 2001. Functional activity of HERV-K-T47D-related long terminal repeats. Virology 283: 262-72. Benit, L . , N . de Parseval, J.-F. Casella, I. Callebaut, A. Cordonnier, and T. Heidmann. 1997. Cloning of a new murine endogenous retrovirus, MuERV-L, with strong similarity to the human H E R V - L element and with a gag coding sequence closely related to the F v l restriction site. J Virol 71: 5652-5657. Benit, L . , J.-F. Lallemand, J.-F. Casella, H. Philippe, and T. Heidmann. 1999. E R V - L elements: a family of endogenous retro virus-like elements active throughout the evolution of mammals. J Virol 73: 3301-3308. Bennetzen, J.L., P. SanMiguel, M . Chen, A . Tikhonov, M . Francki, and Z. Avramova. 1998. Grass genomes. Proc Natl Acad Sci USA 95: 1975-1978. Berkhout, B., M . Jebbink, and J. Zsiros. 1999. Identification of an active reverse transcriptase enzyme encoded by a human endogenous H E R V - K retrovirus. J Virol 73: 2365-2375.  150  Best, S., P.R. Le Tissier, and J.P. Stoye. 1997. Endogenous retroviruses and the evolution of resistance to retroviral infection. Trends Microbiol 5: 313-318. Bi, S., O. Gavrilova, D.-W. Gong, M . M . Mason, and M . Reitman. 1997. Identification of a placental enhancer for the human leptin gene. J Biol Chem 272: 30583-30588. Blanco, P., M . Shlumukova, C.A. Sargent, M.A. Jobling, N . Affara, and M . E . Hurles. 2000. Divergent outcomes of intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism. J Med Genet 37: 752-758. Blomberg, J., D. Ushameckis, and P. Jern. Submitted. Evolutionary aspects of human endogenous retroviral sequences (HERVs) and disease. Blond, J.-L., F. Beseme, L. Duret, O. Bouton, F. Bedin, H. Perron, B. Mandrand, and F. Mallet. 1999. Molecular characterization and placental expression of H E R V - W , a new human endogenous retroviral family. J Virol 73: 1175-1185. Blond, J.-L., V. Cheynet, and F. Mallet. 2001. Signification biologique des retrovirus endogenes humains. Virologie 5:91-111. Blond, J.-L., D. Lavillette, V. Cheynet, O. Bouton, G. Oriol, S. Chapel-Fernandes, B. Mandrand, F. Mallet, and F.-L. Cosset. 2000. A n envelope glycoprotein of the human endogenous retrovirus H E R V - W is expressed in the human placenta and fuses cells expressing the type D mammalian retrovirus receptor. J Virol 74: 3321-3329. Boeke, J.D. and J.P. Stoye. 1997. Retrotransposons, endogenous retroviruses, and evolution of retroelements. In Retroviruses (ed. J.M. Coffin, S.H. Hughes, andH.E. Varmus), pp. 343-436. Cold Spring Harbor Laboratory Press, New York. Boese, A., U . Galli, B. Best, H. Herbst, J. Mayer, E. Kremmer, K. Roemer, and N . MuellerLantzsch. 2000. Human endogenous retrovirus protein cORF supports cell  151  transformation and associates with the promyelocytic leukemia zinc finger protein. Oncogene 19: 4328-4336. Boiler, K., H. Konig, M . Sauter, N. Mueller-Lantzsch, R. Lower, J. Lower, and R. Kurth. 1993. Evidence that H E R V - K is the endogenous retrovirus sequence that codes for the human teratocarcinoma-derived retrovirus H T D V . Virology 196: 349-353. Bonham, K , S.A. Ritchie, S.M. Dehm, K. Snyder, and F . M . Boyd. 2000. A n alternative, human SRC promoter and its regulation by hepatic nuclear factor-1 alpha. J Biol Chem 21'5:37604-37611. Borgese, N., A . D'Arrigo, M . De Silvestris, and G. Pietrini. 1993. NADH-cytochrome b5 reductase and cytochrome b5 isoforms as models for the study of post-translational targeting to the endoplasmic reticulum. FEBSLett 325: 70-5. Briggs, M.R., J.T. Kadonaga, S.P. Bell, and R. Tijan. 1986. Purification and biochemical characterization of the promoter-specific transcription factor, Spl. Science 234: 4752. Brooks, J.K., C O . Leonard, J.K. Zawadzki, A . K . Ommaya, B.A. Levy, and J.M. Orenstein. 1998. Pituitary macroadenoma and cranial osteoma in a manifesting heterozygote with the Opitz G/BBB syndrome. Am J Med Genet 80: 291-3. Brosius, J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238: 115-134. Brown, P.O. 1997. Integration. In Retroviruses (ed. J.M. Coffin, S.H. Hughes, and H.E. Varmus), pp. 161-204. Cold Spring Harbor Laboratory Press, New York.  152  Cainarca, S., S. Messali, A. Ballabio, and G. Meroni. 1999. Functional characterization of the Opitz syndrome gene product (midin): evidence for homodimerization and association with microtubules throughout the cell cycle. Hum Mol Genet 8: 1387-96. Carrasquillo, M . M . , A.S. MsCallion, E.G. Puffenberger, C.S. Kashuk, N . Nouri, and A . Chakravarti. 2002. Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease. Nat Genet 32:237-243. Chakravarti, A. 1996. Endothelin receptor-mediated signaling in Hirschsprung disease. Hum Mol Gen 5: 303-307. Chen, H.-J., K. Carr, R.E. Jerome, and W.J. Edenberg. 2002. A retroviral repetitive element confers tissue-specificity to the human alcohol dehydrogenase 1C (ADH1C) gene. DNA Cell Biol 21: 793-801. Cheng, K.W. 2001. Functional mapping of a placenta-specific upstream promoter for human gonadotropin-releasing hormone receptor gene. Endocrinology 142: 1506-1516. Chretien, S., A . Dubart, D. Beaupain, N. Raich, B. Grandchamp, J. Rosa, M . Goossens, and P.H. Romeo. 1988. Alternative transcription and splicing of the human porphobilinogen deaminase gene result either in tissue-specific or in housekeeping expression. Proc Natl Acad Sci USAS5:  6-10.  Cohen, M . , N . Kato, and E. Larsson. 1988. ERV3 human endogenous provirus mRNAs are expressed in normal and malignant tissues and cells, but not in choriocharcinoma tumor cells. J Cel Bioch 36.  153  Cohen, M . , M . Powers, C. O'Connell, and N . Kato. 1985. The nucleotide sequence of the env gene from the human provirus ERV3 and isolation and characterization of an ERV3specific cDNA. Virology 147: 449-458. Conrad, B., R.N. Weissmahr, J. Boni, R. Arcari, J. Schupbach, and B. Mach. 1997. A human endogenous retroviral superantigen as candidate autoimmune gene in type 1 diabetes. Ce//90: 303-313. Cordonnier, A., J.-F. Casela, and T. Heidmann. 1995. Isolation of novel human endogenous retrovirus-like elements with foamy virus-related pol sequence. J Virol 69: 58905897. Cox, T.C., L.R. Allen, L.L. Cox, B. Hopwood, B. Goodwin, E . Haan, and G.K. Suthers. 2000. New mutations in MIDI provide support for loss of function as the cause of X linked Opitz syndrome. Hum Mol Genet 9: 2553-62. Cullen, B.R. 1998. HIV-1 auxiliary proteins: making connections in a dying cell. Cell 93: 685-692. Dang, Q., D. Walker, S. Taylor, C. Allan, P. Chin, J. Fan, and J. Taylor. 1995. Structure of the hepatic control region of the human apolipoprotein E/C-I gene locus. J Biol Chem 270:22577-22585. Dawkins, R., C. Leelayuwat, S. Gaudieri, G. Tay, J. Hui, S. Cattley, P. Martinez, and J. Kulski. 1999. Genomics ofthe major histocompatibility complex: haplotypes, duplication, retroviruses and disease. Immunol Rev 167: 275-304. de Parseval, N., H . Alkabbani, and T. Heidmann. 1999. The long terminal repeats ofthe H E R V - H human endogenous retrovirus contain binding sites for transcriptional regulation by the Myb protein. J Gen Virol 80: 841-5.  154  Deb-Rinker, P., T.A. Klempan, E. O'Reilly, E.F. Torrey, and S.M. Singh. 1999. Molecular characterization of a MSRV-like sequence identified by R D A from monozygotic twin pairs discordant for schizophrenia. Genomics 61: 133-144. Deininger, P.L. and M.A. Batzer. 1999. Alu repeats and human disease. Mol Gen Met 67: 183-193. Deininger, P.L. and M . A . Batzer. 2002. Mammalian retroelements. Genome Res. 12: 14551465. Deragon, J.M. and P. Capy. 2000. Impact of transposable elements on the human genome. Ann Med 32: 264-273. Di Cristofano, A.D., M . Strazzullo, L. Longo, and G. LaMantia. 1995. Characterization and genomic mapping of the ZNF80 locus: expression of this zinc-finger gene is driven by a solitary L T R or ERV9 endogenous retroviral family. Nucleic Acids Res 23: 2823-2830. Domansky, A.N., E.P. Kopantzev, E.V. Snezhkov, Y.B. Lebedev, C. Leib-Mosch, and E.D. Sverdlov. 2000. Solitary H E R V - K LTRs possess bi-directional promoter activity and contain a negative regulatory element in the U5 region. FEBS Lett 472: 191-5. Duhl, D.M.J., H. Vrieling, K.A. Miller, G.L. Wolff, and G.S. Barsh. 1994. Neomorphic agouti mutations in obese yellow mice. Nat Genet 8: 59-65. Elshourbagy, N.A., J.E. Adamou, A.W. Gagnon, H.-L. Wu, M . Pullen, and P. Nambi. 1996. Molecular characterization of a novel human endothelin receptor splice variant. JBiol Chem 271: 25300-25307. Esnault, C , J. Maestre, and T. Heidmann. 2000. Human LINE retrotransposons generate processed pseudogenes. Nat Genet 24: 363-367.  155  Esterbauer, H., H. Oberkofler, F. Krempler, A.D. Strosberg, and W. Patsch. 2000. The uncoupling protein-3 gene is transcribed from tissue-specific promoters in humans but not rodents.  J Biol Chem 275: 36394-36399.  Fant, M . E . , L. Nanu, and R.A. Word. 1992. A potential role for endothelin-1 in human placental growth: interactions with the insulin-like growth factor family of peptides. J  Clin Endocrinol Metab 74: 1158-63. Feuchter, A . and D. Mager. 1990. Functional heterogenity of a large family of human LTRlike promoters and enhancers.  Nucl Acids Res 18: 1261-1270.  Feuchter-Murthy, A . E . , J.D. Freeman, and D.L. Mager. 1993. Splicing of a human endogenous retrovirus to a novel phospholipase A2 related gene. Nucleic Acids  Res  21: 135-43. Freitas, E . M . , S. Gaudieri, W.J. Zhang, J.K. Kulski, F . M . van Bockxmeer, F.T. Christiansen, and R.L. Dawkins. 2000. Duplication and diversification of the apolipoprotein CI (APOC1) genomic segment in association with retroelements.  J Mol Evol 50: 391-  396. Gattoni-Celli, S., K. Kirsch, S. Kalled, and K.J. Isselbacher. 1986. Expression of type C related endogenous retroviral sequences in human colon tumors and colon cancer cell lines.  Proc Natl Acad Sci USA 83: 6127-6131.  Gaudenz, K., E. Roessler, N. Quaderi, B. Franco, G. Feldman, D.L. Gasser, B. Wittwer, J. Horst, E. Montini, J.M. Opitz, A. Ballabio, and M . Muenke. 1998. Opitz G/BBB syndrome in Xp22: mutations in the MIDI gene cluster in the carboxy-terminal domain. Am  JHum Genet 63: 703-10.  156  Gaudieri, S., J. Kulski, R. Dawkins, and T. Gojobori. 1999. Different evolutionary histories in two subgenomic regions of the major histocompatibility complex. Genome Res. 9: 541-549. Gilbert, N., Lutz-Prigge, and J.V. Moran. 2002. Genomic deletions created upon LLNE-1 retrotransposition. Cell 110: 315-325. Golovkina, T.V., J.P. Dudley, and S.R. Ross. 1998. B and T cells are required for mouse mammary tumor virus spread within the mammary gland. J Immunol 161: 23752382. Goodchild, N.L., D.A. Wilkinson, and D.L. Mager. 1992. A human endogenous long terminal repeat provides a polyadenylation signal to a novel, alternatively spliced transcript in normal placenta. Gene 121: 287-94. Goodchild, N.L., D.A. Wilkinson, and D.L. Mager. 1993. Recent evolutionary expansion of a subfamily of R T V L - H human endogenous retrovirus-like elements. Virology 196: 778-88. Griffiths, D.J. 2001. Endogenous retroviruses in the human genome sequence. Genome Biol 2: 1017.1-1017.5. Hammarskjold, M . L . , S.C. Wang, and G. Klein. 1986. High-level expression of the EpsteinBarr virus EBNA1 protein in CV1 cells and human lymphoid cells using a SV40 late replacement vector. Gene 43: 41-50. Handwerger, S. 1995. Endothelins and the placenta. JLab Clin Med 125: 679-681. Harris, J.R. 1998. Placental endogenous retrovirus (ERV): structural, functional, and evolutionary significance. BioEssays 20: 307-316.  157  Hellman-Blumberg, U., M.F. McCarthy-Hintz, J.M. Gatewood, and C.W. Schmid. 1993. Developmental differences in methylation of human Alu repeats. Mol Cell Biol 13: 4523-4530. Hiom, K., M . Melek, and M . Gellert. 1998. D N A transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell 94: 463-470. Hirose, Y., M . Takamatsu, and F. Harada. 1993. Presence of env genes in members of the R T V L - H family of human endogenous retrovirus-like elements. Virology 192: 52-61 Hofinann, M . , M . Harris, D. Juriloff, and T. Boehm. 1998. Spontaneous mutations in SELH/Bc mice due to insertions of early transposons: molecular characterization of null alleles at the nude and albino loci. Genomics 52: 107-109. Hughes, D.C. 2001. Alternative splicing ofthe human VEFGGR-3/FLT4 gene as a consequence of an integrated human endogenous retrovirus. J Mol Evol 53: 77-79. Hughes, J.F. and J.M. Coffin. 2001. Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution. Nat Genet 29: 487-489. Jaeckel, E., S. Heringlake, D. Berger, G. Brabant, G. Hunsmann, and M.P. Manns. 1999. No evidence for association between IDDMKl,2-22, a novel isolated retrovirus, and IDDM. Diabetes 48: 209-214. Johansen, T., T. Holm, and E. Bjorklid. 1989. Members ofthe R T V L - H family of human endogenous retrovirus-like elements are expressed in placenta. Gene 79: 259-267. Johnson, K.R., S.A. Cook, L . C . Erway, A . N . Matthews, L.P. Sanford, N.E. Paradies, and R.A. Friedman. 1999. Inner ear and kydney anomalies caused by IAP insertion in an intron ofthe Eyeal gene in a mouse model of BOR syndrome. Hum Mol Gen 8: 645653.  158  Jones, P.A. 1999. The D N A methylation paradox. Trends Genet 15: 34-37. Jong, M.C., M . H . Hofker, and L . M . Havekes. 1999. Role of ApoCs in lipoprotein metabolism: functional differences between A p C l , ApoC2 and ApoC3. Arterioscler Thromb Vase Biol 19: 472-484. Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Gen 16: 418-420. Kadonaga, J.T., K.A. Jones, and R. Tijan. 1986. Promoter-specific activation of R N A polymerase-II transcription by SP1. Trends Biochem Sci 11: 20-24. Kajikawa, M . and N. Okada. 2002. LLNEs mobilize SINEs in the eel through a shared 3' sequence. Cell 111: 433-444. Kanaji, T., T. Okamura, K. Osaki, M . Kuroiwa, K. Shimoda, N. Hamasaki, and Y . Niho. 1998. A common genetic polymorphism (46 C to T substitution) in the 5'untranslated region of the coagulation factor XII gene is associated with low translation efficiency and decrease in plasma factor XII level. Blood 91: 2010-2014. Kapitonov, V . V . and J. Jurka. 1999. The long terminal repeat of an endogenous retrovirus induces alternative splicing and encodes an additional carboxy-terminal sequence in the human leptin receptor. J Mol Evol 48: 248-51. Kass, S.U., D. Pruss, and A.P. Wolffe. 1997. How does D N A methylation repress transcription. Trends Genet 13: 444-449. Kato, N., S. Pfeifer-Ohlsson, M . Kato, E. Larsson, J. Rydnert, R. Ohlsson, and M . Cohen. 1987. Tissue-specific expression of human provirus ERV3 mRNA in human placenta: two of the three ERV3 mRNAs contain human cellular sequences. J Virol 61: 2182-2191.  159  Kato, N., K. Shimotohno, D. VanLeeuwen, and M . Cohen. 1990. Human proviral mRNAs down regulated in choriocarcinoma encode a zinc finger protein related to Kruppel. Mol Cell Biol 10: 4401-5. Kazazian, H.H.J. 1998. Mobile elements and disease. Curr Opin Genet Dev 8: 343-350. Kazazian, H.H.J. 1999. A n estimated frequency of endogenous insertional mutations in humans. Nat Genet 22: 122. Kazazian, H.H.J, and J.V. Moran. 1998. The impact of LI retrotransposons on the human genome. Nat Genet 19: 19-24. Kitamura, Y., T. Ayukawa, T. Ishikawa, T. Kanda, and K. Yoshike. 1996. Human endogenous retrovirus K10 encodes a functional integrase. J Virol 70: 3302-3306. Knofler, M . , G. Meinhardt, S. Bauer, T. Loregger, R. Vasicek, D.J. Bloor, S.J. Kimber, and P. Husslein. 2002. Human Handl basic helix-loop-helix (bHLH) protein: extraembryonic expression pattern, interaction partners and identification of its transcription repressor domain. Biochem J361: 641-651. Knossl, M . , R. Lower, and J. Lower. 1999. Expression of the human endogenous retrovirus H T D V / H E R V - K is enhanced by cellular transcription factor Y Y 1 . J Virol 73: 12541261. Kowalski, P.E., J.D. Freeman, and D.L. Mager. 1999. Intergenic splicing between a H E R V H endogenous retrovirus and two adjacent human genes. Genomics 57: 371-9. Kowalski, P.E. and D.L. Mager. 1998. A human endogenous retrovirus suppresses translation of an associated fusion transcript, PLA2L. J Virol 72: 6164-8. Kozak, M . 1991. A n analysis of vertebrate mRNA sequences: intimations of translational control. J Cell Biol 115: 887-903.  160  Kozak, M . 1995. Adherence to the first-AUG rule when a second A U G codon follows closely upon the first. Proc  Nat Acad Sci USA 92: 2662-2666.  La Mantia, G., D. Maglione, G. Pengue, A . Di Cristofano, A. Simeone, L. Lanfrancone, and L. Lania. 1991. Identification and characterization of novel human endogenous retroviral sequences prefentially expressed in undifferentiated embryonal carcinoma cells. Nucleic Acids  Res 19: 1513-20.  La Mantia, G., B. Majello, A . Di Cristofano, M . Strazzullo, G. Minchiotti, and L. Lania. 1992. Identification of regulatory elements within the minimal promoter region of the human endogenous ERV9 pro viruses: accurate transcription initiation is controlled by an Inr-like element. Nucl Acids  Res 20: 4129-36.  Lan, M.S., A. Mason, R. Coutant, Q.-Y. Chen, A. Vargas, J. Rao, R. Gomez, S. Chalew, R. Garry, and N.K. MacLaren. 1998. HERV-KlOs and immune-mediated (type 1) diabetes.  Cell 95: 14-16.  Lander, E.S., L . Linton, B. Birren, C. Nusbaum, M . C . Zody, J. Baldwin, K. Devon, K. Dewar, M . Doyle, W. FitzHugh, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921. Lauer, S.J., D. Walker, N.A. Elshourbagy, C A . Reardon, B. Levy-Wilson, and J.M. Taylor. 1988. Two copies of the human apolipoprotein C-I genes are linked closely to the apolipoprotein E gene. JBiol  Chem 263: 7277-7286.  Leib-Mosch, C , M . Haltmeier, T. Werner, E . - M . Geigl, R. Brack-Werner, U . Francke, V . Erfle, and R. Hehlmann. 1993. Genomic distribution and transcription of solitary  H E R V - K LTR. Genomics 18: 261-269.  161  Leib-Mosch, C , W. Seifarth, and U . Schon. In Press. Influence of human endogenous retroviruses on cellular gene expression. In Retroviruses and primate evolution (ed. E. Sverdlov). Lewis, R.W., R. Ganesan, K. Houtchens, L.A. Tolar, and F . M . Sheen. 1993. Transposons in place of telomeric repeats at aDrosophila telomere. Cell 75: 1083-1093. Li, W.H. and M . Tanimura. 1987. The molecular clock runs more slowly in man than in apes and monkeys. Nature 326: 93-96. Li, W.-H., M . Tanimura, C.C. Luo, S. Datta, and L. Chan. 1988. The apoliprotein multigene family: biosynthesis, structure, structure-function relationships, and evolution. J Lipid Res 29: 245-271. Liang, F., I. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush. 2000. Gene Index analysis ofthe human genome estimates approximately 120,000 genes. Nat Genet 25: 239-240. Lindeskog, M . , D.L. Mager, and J. Blomberg. 1999. Isolation of a human endogenous retroviral H E R V - H element with an open env reading frame. Virology 258: 441-50. Liu, J., T.D. Prickett, E. Elliott, G. Meroni, and D.L. Brautigan. 2001. Phosphorylation and microtubule association of the Opitz syndrome protein mid-1 is regulated by protein phosphatase 2A via binding to the regulatory subunit alpha 4. Proc Natl Acad Sci U S A 98: 6650-5. Long, Q., C. Bengra, C. L i , F. Kutlar, and D. Tuan. 1998. A long terminal repeat of the human endogenous retrovirus ERV-9 is located in the 5' boundary area of the human beta-globin locus control region. Genomics 54: 542-55.  162  Lower, R., K. Boiler, B. Hasenmaier, C. Korbmacher, N . Mueller-Lantzsch, J. Lower, and R. Kurth. 1993. Identification of human endogenous retroviruses with complex mRNA expression and particle formation. Proc Natl Acad Sci USA 90: 4480-4484. Lower, R., J. Lower, and R. Kurth. 1996. The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. Proc Natl Acad Sci USA 93: 5177-84. Lower, R., R.R. Tonjes, K. Boiler, J. Denner, B. Kaiser, R.C. Phelps, J. Lower, R. Kurth, K. Badenhoop, H . Dormer, K . H . Usadel, T. Miethke, M . Lapatschek, and H . Wagner. 1998. Development of insulin-dependent diabetes mellitus does not depend on specific expression of the human endogenous retrovirus H E R V - K . Cell 95: 11-13. Lyon, M . F . 1998. X-chromosome inactivation: a repeat hypothesis. Cytogenet Cell Genet 80: 133-137. Mager, D.L., D.G. Hunter, M . Schertzer, and J.D. Freeman. 1999. Endogenous retroviruses provide the primary polyadenylation signal for two new human genes (HHLA2 and HHLA3). Genomics 59: 255-263. Mager, D.L., K . L . McQueen, V. Wee, and J.D. Freeman. 2001. Evolution of natural killer cell receptors: coexistence of functional Ly49 and KIR genes in baboons. Curr Biol 11: 626-630. Mager, D.L. and P. Medstrand. In Press. Retroviral repeat sequences. Encyclopedia of the Human Genome. Magin, C , R. Lower, and J. Lower. 1999. cORF and RcRE, the Rev/Rex and RRE/RxRE homologues of the human endogenous retrovirus family H T D V / H E R V - K . J Virol 73: 9496-9507.  163  Makalowski, W., G.A. Mitchell, and D. Labuda. 1994. Alu sequences in the coding regions of mRNA: a source of protein variability. Trends Gen 10: 188-193. Makalowski, W., J. Zhang, and M.S. Boguski. 1996. Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Genome Res 6: 846-857. Malik, H.S., S. Henikoff, and T.H. Eickbush. 2000. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10: 13071318. Mathews, D.H., J. Sabina, M . Zuker, and D.H. Turner. 1999. Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of R N A Secondary Structure. JMol Biol 288: 911-940. Mayer, J., M . Sauter, A. Racz, D. Scherer, N . Mueller-Lantzsch, and E. Meese. 1999. A n almost-intact human endogenous retrovirus K on human chromosome 7. Nat Genet 21: 257-258. McMahon, L.P., C.W. Redman, and J.D. Firth. 1993. Expression of the three endothelin genes and plasma levels of endothelin in pre-eclamptic and normal gestations. Clin Sci 85: 417-424. Medstrand, P. and J. Blomberg. 1993. Characterization of novel reverse transcriptase encoding human endogenous retroviral sequences similar to type A and type B retroviruses: differential transcription in normal human tissues. J Virol 67: 67786787.  164  Medstrand, P., J.R. Landry, and D.L. Mager. 2001. Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. JBiol  Chem 276: 1896-903.  Medstrand, P., M . Lindeskog, and J. Blomberg. 1992. Expression of human endogenous retroviral sequences in peripheral blood mononuclear cells of healthy individuals. J  Gen Virol 73: 2463-2466. Medstrand, P. and D.L. Mager. 1998. Human-specific integrations ofthe H E R V - K endogenous retrovirus family. J Virol 72: 9782-7. Medstrand, P., L . N . van de Lagemaat, and D.L. Mager. 2002. Retroelement distributions in the human genome: variations associated with age and proximity to genes.  Genome  Res. 12: 1483-1495. Mi, S., X. Lee, X.-P. L i , G . M . Veldman, H . Finnerty, L. Racie, E. LaVallie, X . Y . Tang, P. Edouard, S. Howes, J.C. Keith, and J.M. McCoy. 2000. Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis.  Nature 403:  785-789. Miki, Y . , T. Katagiri, F. Kasumi, T. Yoshimoto, and Y. Nakamura. 1996. Mutation analysis in the BRCA2 gene in primary breast cancer. Nat Genet 13: 245-247. Moran, J.V., R.J. DeBerardinis, and H.H.J. Kazazian. 1999. Exon shuffling by LI retrotransposition.  Science 283: 1530-1534.  Mueller-Lantzsch, N., M . Sauter, A. Weiskircher, K . Kramer, B. Best, M . Buck, and F. Grasser. 1993. Human endogenous retroviral element K10 (HERV-K10) encodes a full-length gag homologous 73-kDa protein and a functional protease.  Retroviruses 9: 343-350.  165  AIDS Res Hum  .Munoz-Sanjuan, I., P.M. Smallwood, and J. Nathans. 2000. Isoform diversity among fibroblast growth factor homologous factors is generated by alternative promoter usage and differential splicing. JBiol Chem 275: 2589-2597. Murmane, J.P. and J.F. Morales. 1995. Use of a mammalian interspersed repetitive (MIR) element in the coding and processing sequences of mammalian genes. Nucl Acids Res 23: 2837-2839. Murphy, V.J., L.C. Harrison, W.A. Rudert, P. Luppi, M . Trucco, A. Fierabracci, P.A. Biro, and G.F. Bottazzo. 1998. Retroviral superantigens and type 1 diabetes mellitus. Cell 95: 9-11. Nakamuta, M . , R. Takayanagi, Y . Sakai, S. Sakamoto, H. Hagiwara, T. Mizuno, Y . Saito, Y . Hirose, M . Yamamoto, and H. Nawata. 1991. Cloning and sequence analysis of a cDNA encoding human non-selective type of endothelin receptor. Biochem Biophys Res Commun 111: 34-39. Nekrutenko, A. and W.-H. Li. 2001. Transposable elements are found in a large number of human protein-coding genes. Trends Gen 17: 619-621. Nelson, D.T., N.L. Goodchild, and D.L. Mager. 1996. Gain of Spl sites and loss of repressor sequences associated with a young, transcriptionally active subset of H E R V - H endogenous long terminal repeats. Virology 220: 213-8. Nicholas, K.B., H.B.J. Nicholas, and D.W.I. Deerfield. 1997. GeneDoc: Analysis and Visualization of Genetic Variation. EMBNEW.NEWS 4:14. Ogawa, Y., K. Nakao, H. Arai, O. Nakagawa, K. Hosoda, S. Suga, S. Nakanishi, and H. Imura. 1991. Molecular cloning of a non-isopeptide-selective human endothelin receptor. Biochem Biophys Res Commun 178: 248-25.  166  Opitz, J.M. 1987. G syndrome (hypertelorism with esophageal abnormality and hypospadias, or hypospadias-dysphagia, or Opitz-Frias or Opitz-G syndrome)~perspective in 1987 and bibliography. Am J Med Genet 28: 275-85. Ostertag, E . M . and H.H.J. Kazazian. 2001. Biology of mammalian LI retrotransposons. Annu Rev Genet 35: 501-538. Pantel, J., K. Machini, M . - L . Sobrier, P. Duquesnoy, M . Goossens, and S. Amselem. 2000. Species-specific alternative splice mimicry at the growth hormone receptor locus revealed by the lineage of retroelements during primate evolution. J Biol Chem 275: 18664-18669. Pardue, M . L . , O.N. Danilevskaya, K. Lowenhaupt, F. Slot, and K . L . Traverse. 1996. Drosophila telomeres: new views on chromosome evolution. Trends Gen 12: 48-52. Pastorcic, M . , S. Birnbaum, and J.E. Hixson. 1992. Baboon apolipoprotein C-I: cDNA and gene structure and evolution. Genomics 13: 368-374. Patience, C., Y . Takeuchi, and R.A. Weiss. 1997. Infection of human cells by an endogenous retrovirus of pigs. Nat Med 3: 282-286. Perron, H., J.A. Garson, F. Bedin, F. Beseme, G. Parahnos-Baccala, F. Komurian-Pradel, F. Mallet, P.W. Tuke, C. Voisset, J.-L. Blond, B. Lalande, J.M. Seigneurin, B. Mandrand, and C.R.G.o.M. Sclerosis. 1997. Molecular identification of a novel retrovirus repeatedly isolated from patients with multiple sclerosis. Proc Natl Acad Sci USA 94: 7583-7588. Perry, J., S. Feather, A. Smith, S. Palmer, and A. Ashworth. 1998. The human F X Y gene is located within Xp22.3: implications for evolution ofthe mammalian X chromosome. Hum Mol Genet 7: 299-305.  167  Pesole, G., S. Liuni, G. Grillo, M . Ippedico, A. Larizza, W. Makalowski, and C. Saccone. 1999. UTRdb: a specialized database of 5' and 3' untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 27: 188-191. Pickeral, O.K., W. Makalowski, M.S. Boguski, and J.D. Boeke. 2000. Frequent human genomic D N A transduction driven by LLNE-1 retrotransposition. Genome Res. 10: 411-415. Plant, K . E . , S.J.E. Routledge, and N.J. Proudfoot. 2001. Intergenic transcription in the human fi-globin gene cluster. Mol Cell Biol 21: 6507-6514. Quaderi, N.A., S. Schweiger, K. Gaudenz, B. Franco, E.I. Rugarli, W. Berger, G.J. Feldman, M . Volta, G. Andolfi, S. Gilgenkrantz, R.W. Marion, R.C. Hennekam, J.M. Opitz, M . Muenke, H.H. Ropers, and A. Ballabio. 1997. Opitz G/BBB syndrome, a defect of midline development, is due to mutations in a new RING finger gene on Xp22. Nat Genet 17: 285-91. Rabson, A.B. and B.J. Graves. 1997. Synthesis and processing of viral RNA. In Retroviruses (ed. J.M. Coffin, S.H. Hughes, and H.E. Varmus), pp. 205-262. Cold Spring Harbor Laboratory Press, New York. Rabson, A.B., Y . Hamagishi, P.E. Steele, M . Tykocinski, and M . A . Martin. 1985. Characterization of human endogenous retroviral envelope R N A transcripts. Rabson, A.B., P.E. Steele, C F . Garon, and M . A . Martin. 1983. mRNA transcripts related to full-length endogenous retroviral D N A in human cells. Nature 306: 604-607. Repaske, R., P.E. Steele, R.R. O'Neill, A.B. Rabson, and M . A . Martin. 1985. Nucleotide sequence of a full-length human endogenous retroviral segment. J Virol 54: 764-772.  168  Riley, P., L. Anson-Cartwright, and J.C. Cross. 1998. The Handl b H L H transcription factor is essential for placentation and cardiac morphogenesis. Nat Genet 18: 271-275. Robin, N.H., J.M. Opitz, and M . Muenke. 1996. Opitz G/BBB syndrome: clinical comparisons of families linked to Xp22 and 22q, and a review ofthe literature. Am J  Med Genet 62: 305-17. Sakurai, T., M . Yanagisawa, Y . Takuwa, H. Miyazaki, S. Kimura, K. Goto, and T. Masaki. 1990. Cloning of a cDNA encoding a non-isopeptide-selective subtype of the endothelin receptor. Nature 348: 732-735. Samuelson, L . C . , R.S. Phillips, and L.J. Swanberg. 1996. Amylase gene structures in primates: retroposon insertions and promoter evolution.  Mol Biol Evol 13: 767-79.  Samuelson, L . C . , K. Wiebauer, C M . Snow, and M . H . Meisler. 1990. Retroviral and pseudogene insertion sites reveal the lineage of human salivary and pancreatic amylase genes from a single gene during primate evolution.  Mol Cell Biol 10: 2513-  20. Sassaman, D . M . , B.A. Dornbroski, J.V. Moran, M . L . Kimberland, T.P. Naas, R.J. DeBerardinis, A . Gabriel, G.D. Swergold, and H.H.J. Kazazian. 1997. Many human LI elements are capable of retrotransposition. Nat Genet 16: 37-43. Sauter, M . , S. Schommer, E. Kremmer, K. Remberger, G. Dolken, I. Lemm, M . Buck, B. Best, D. Neumann-Haefelin, and N. Mueller-Lantzsch. 1995. Human endogenous retrovirus K10: expression of gag protein and detection of antibodies in patients with  seminomas. J Virol 69: 414-421. Schmid, C.W. 1991. Human Alu subfamilies and their methylation revealed by blot hybridization.  Nucleic Acids Res 19:5613-5617.  169  Schon, U., W. Seifarth, C. Baust, C. Hohenadl, V . Erfle, and C. Leib-Mosch. 2001. Cell type-specific expression and promoter activity of human endogenous retroviral long terminal repeats. Virology 279: 280-291. Schug, J. and G.C. Overton. 1997. TESS: Transcription Element Search Software on the  WWW. Technical Report of the Computational Biology and Informatics Laboratory, School ofMedicine, University of Pennsylvania CBIL-TR-1997-1001-v0.0. Schulte, A . M . , S. Lai, A . Kurtz, F. Czubayko, A.T. Riegel, and A . Wellstein. 1996. Human trophoblast and choriocarcinoma expression of the growth factor pleiotrophin attributable to germ-line insertion of an endogenous retrovirus. Proc Natl Acad Sci U 5^93:14759-64. Schulte, A . M . , C. Malerczyk, R. Cabal-Manzano, J.J. Gajarsa, H.J. List, A.T. Riegel, and A . Wellstein. 2000. Influence of the human endogenous retro virus-like element H E R V E.PTN on the expression of growth factor pleiotrophin: a critical role of a retroviral Spl-binding site [In Process Citation]. Oncogene 19: 3988-98. Schulte, A . M . and A . Wellstein. 1998. Structure and phylogenetic analysis of an endogenous retrovirus inserted into the human growth factor gene pleiotrophin. J Virol 72: 606572. Schweiger, S., J. Foerster, T. Lehmann, V. Suckow, Y . A . Muller, G. Walter, T. Davies, H . Porter, H. van Bokhoven, P.W. Lunt, P. Traub, and H.H. Ropers. 1999. The Opitz syndrome gene product, MIDI, associates with microtubules. Proc Natl Acad Sci US A 96: 2794-9. Segal, R. and A.J. Berk. 1991. Promoter activity and distance constraints of one versus two Spl binding sites. J Biol Chem 266: 20406-20411.  170  Seifarth, W., C. Baust, A . Murr, H . Skladny, F. Kreig-Schneider, J. Blusch, T. Werner, R. Hehlmann, and C. Leib-Mosch. 1998. Proviral structure, chromosomal location, and expression of HERV-K-T47D, a novel human endogenous retrovirus derived from T47D particles. J Virol 72: 8384-8391. Seifarth, W., H . Skladny, F. Kreig-Schneider, A . Reichert, R. Hehlmann, and C. LeibMosch. 1995. Retrovirus-like particles released from the human breast cancer cell line T47D display type B and C related endogenous retroviral sequences. J Virol 69: 6408-6416. Shigematsu, K , A. Nakatani, K. Kawai, R. Moriuchi, S. Katamine, T. Miyamoto, and M . Niwa. 1996. Two subtypes of endothelin receptors and endothelin peptides are expressed in differential cell types ofthe rat placenta: in vitro receptor autoradiographic and in situ hybridization studies. Endocrinology 137: 738-748. Shih, A., E.E. Coutavas, and M . G . Rush. 1991. Evolutionary implications of primate endogenous retroviruses. Virology 182: 495-502. Sibley, C.G. and J.E. Ahlquist. 1987. D N A hybridization evidence of hominoid phylogehy: results from an expanded data set. J Mol Evol 26: 99-121. Sjottem, E., S. Anderssen, and T. Johansen. 1996. The promoter activity of long terminal repeats of the H E R V - H family of human retrovirus-like elements is critically dependent on Spl family proteins interacting with a G C / G T box located immediately 3' to the T A T A box. J Virol 70: 188-98. Smit, A.F.A. 1993. Identification of a new, abundant superfamily of mammalian LTRtransposons. Nucl Acids Res 21: 1863-1872.  171  Smit, A.F.A. 1995. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucl Acids Res 23: 98-102. Smit, A.F.A. 1996. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev 6: 7'43-7'48. Smit, A.F.A. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9: 657-663. Smit, A.F.A. and P. Green. 1999. RepeatMasker. Unpublished. Smit, A.F.A. and A.D. Riggs. 1996. Tiggers and other D N A transposon fossils in the human genome. Proc Natl Acad Sci USA 93: 1443-1448. Smit, M . , E. van der Kooij-Meis, R.R. Frants, L. Havekes, and E.C. Klasen. 1988. Apolipoprotein gene cluster on chromosome 19. Definite localization of the APOC2 gene and the polymorphic Hpal site associated with type III hyperlipoproteinemia. Hum Genet 78: 90-93. Sobczak, K. 2002. Structural determinants of BRCA1 translational regulation. J Biol Chem 277: 17349-58. Sorek, R., G. Ast, and D. Graur. 2002. Alu-containing exons are alternatively spliced. Genome Res. 12: 1060-1067. Speek, M . 2001. Antisense promoter of human LI retrotransposon drives transcription of adjacent cellular genes. Mol Cell Biol 21: 1973-1985. Stoye, J.P., S. Fenner, G.E. Greenoak, C. Moran, and J.M. Coffin. 1988. Role of endogenous retroviruses as mutagens: the hairless mutation of mice. Cell 54: 383-391.  172  Sun, C , H. Skaletsky, S. Rozen, J. Gromoll, E. Nieschlag, R. Oates, and D.C. Page. 2000. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum Mol Gen 9: 2291-2296. Sverdlov, E.D. 2000. Retroviruses and primate evolution. Bioessays 22: 161-71. Symer, D.E., C. Connely, S.T. Szak, E . M . Caputo, G.J. Cost, G. Parmigiani, and J.D. Boeke. 2002. Human LI retrotransposition is associated with genetic instability in vivo. Cell 110:327-338. Tatiana, A . T . and T.L. Madden. 1999. Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174: 247-250. Teerink, H., M . A . Kasperaitis, C H . De Moor, H.O. Voorma, and A . A . Thomas. 1994. Translation initiation on the insulin-like growth factor II leader 1 is developmentally regulated. Biochem J303: 547-53. Thompson, J.D., D.G. Higgins, and T.J. Gibson. 1994. C L U S T A L W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673-4680. Ting, C.N., M.P. Rosenberg, C M . Snow, L . C Samuelson, and M . H . Meisler. 1992. Endogenous retroviral sequences are required for tissue-specific expression of a human salivary amylase gene. Genes Dev 6: 1457-65. Tomilin, N.V. 1999. Control of genes by mammalian retroposons. Int. Rev. Cytology 186: 148.  173  Tonjes, R.R., C. Limbach, R. Lower, and R. Kurth. 1997. Expression of human endogenous retrovirus type K envelope glycoprotein in insect and mammalian cells. J Virol 71: 2747-2756. Tristem, M . 2000. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J Virol 74: 3715-3730. Trockenbacher, A., V . Suckow, J. Foerster, J. Winter, S. Krauss, H.H. Ropers, R. Schneider, and S. Schweiger. 2001. MIDI, mutated in Opitz syndrome, encodes an ubiquitin ligase that targets phosphatase 2A for degradation. Nat Genet 29: 287-94. Tsutsumi, M . , G. Liang, and P.A. Jones. 1999. Novel endothelin B receptor transcripts with the potential of generating a new receptor. Gene 228: 43-49. Turner, G., M . Barbulescu, M . Su, M.I. Jensen-Seaman, K.K. Kidd, and J. Lenz. 2001. Insertional polymorphisms of full-length endogenous retroviruses in humans. Current 5/o/ll:1531-1535. Ullu, E., S. Murphy, and M . Melli. 1982. Human 7SL R N A consists of a 140 nucleotide middle-repetitive sequence inserted in an Alu sequence. Cell 29: 195-202. Valhmu, W.B., G.D. Palmer, J. Dobson, S.G. Fischer, and A . Ratcliffe. 1998. Regulatory activities of the 5'- and 3'- untranslated regions and promoter of the human aggrecan gene. J Biol Chem 273: 6196-6202. Van den Veyver, I.B., T.A. Cormier, V . Jurecic, A. Baldini, and H.Y. Zoghbi. 1998. Characterization and physical mapping in human and mouse of a novel RING finger gene in Xp22. Genomics 51: 251-61.  174  Venables, P.J.W., S.M. Brookes, D. Griffiths, R.A. Weiss, and M . T . Boyd. 1995. Abundance of an endogenous retroviral envelope protein in placental trophoblasts suggests a biological function. Virology 211: 589-592. Vogt, V . M . 1997. Retroviral virions and genomes. In  Retroviruses (ed. J.M. Coffin, S.H.  Hughes, and H.E. Varmus), pp. 27-70. Cold Spring Harbor Laboratory Press, New York. Wallace, M.R., L.B. Andersen, A . M . Saulino, P.E. Gregory, T.W. Glover, and F.S. Collins. 1991. A  de novo Alu insertion results in neurofibromatosis type 1. Nature 353: 864-  866. Walsh, C P . and T.H. Bestor. 1999. Cytosine methylation and mammalian development.  Genes Dev 13: 26-34. Walsh, C P . , J.R. Chaillet, and T.H. Bestor. 1998; Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet 20: 116-117. Wang, C D . , G.D. Chang, Y . K . Lee, and H . Chen. 2001. A functional composite cw-element for NF kappa b and RBJ kappa in the rat pregnancy-specific glycoprotein gene. Biol fle/wod 65: 1437-1443. Wang, Z. and S. Melmed. 1998. Functional map of a placenta-specific enhancer of the human leukemia inhibitory factor receptor gene. J Biol  Chem 273: 26069-26077.  Watanabe, K., C A . Kessler, C J . Bachurski, Y . Kanda, B.D. Richardson, J. Stanek, S. Handwerger, and A.K. Brar. 2001. Identification of a decidua-specific enhancer on the human prolactin gene with two critical activator protein 1 (AP-1) binding sites.  MolEndocr 15: 638-653.  175  Waterston, R.H., K. Lindblad-Toh, E. Birney, J. Rogers, J.F. Abril, P. Agarwal, R. Agarwala, R. Ainscough, M . Alexandersson, P. An, S.E. Antonarakis, et al. 2002. Initial sequencing and comparative analysis ofthe mouse genome. Nature 420: 520-562. Weichenrieder, O., K. Wild, K. Strub, and S. Cusack. 2000. Structure and assembly of the Alu domain of the mammalian signal recognition particle. Nature 408: 167-173. Whitelaw, E. and D.I.K. Martin. 2001. Retrotransposons as epigenetics mediators of phenotypic variation in mammals. Nat Genet 27: 361-365. Wilhide, C C , Y . Jin, Q. Guo, L. L i , S.X. L i , E. Rubin, and P.F. Bray. 1997. The human integrin beta3 gene is 63 kb and contains a 5'-UTR sequence regulating expression.  Blood 90: 3951-3961. Wilkinson, D.A., J.D. Freeman, N.L. Goodchild, C A . Kelleher, and D.L. Mager. 1990. Autonomous expression of R T V L - H endogenous retro viruslike elements in human  cells. J Virol 64: 2157-67. Wilkinson, D.A., D.L. Mager, and J.-A.C. Leong. 1994. Endogenous Human Retroviruses. In  The Retroviridae (ed. J.A. Levy), pp. 465-535. Plenum Press, New York. Yamada, K., H. Ogawa, S.I. Honda, N . Harada, and T. Okazaki. 1995. Regulation of placenta-specific expression of the aromatase cytochrome P-450 gene. J Biol Chem 270:25064-25069. Yamada, K., H. Ogawa, S.I. Honda, N . Harada, and T. Okazaki. 1999. A G C M motif protein is involved in placenta-specific expression of human aromatase gene. J Biol Chem 45: 32279-32286. Yang, Z., D. Boffelli, N. Boonmark, K. Schwartz, and R. Lawn. 1998. Apolipoprotein(a) gene enhancer resides within a LINEelement. J Biol Chem 273: 891-897.  176  Yoder, J.A., C P . Walsh, and T.H. Bestor. 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Gen 13: 335-340. Yu, C , K. Shen, M . Lin, P. Chen, C. Lin, G.D. Chang, and H . Chen. 2002. GCMa regulates the syncytin-mediated trophoblastic fusion. J Biol Chem 211: 50062-50068. Yulug, I.G., A. Yulug, and E.M.C. Fisher. 1995. The frequency and position of Alu repeats in cDNAs, as determined by database searching. Genomics 2 7 : 544-548. Zhang, Y . and M . A . Frohman. 1997. Using rapid amplification of cDNA ends (RACE) to obtain full-length cDNAs. Methods Mol Biol 69: 61-87.  177  Appendix: Repetitive elements in the 5' untranslated region of a human zinc finger gene modulate transcription and translation efficiency  This appendix is based on results from a manuscript entitled as above by J.-R. Landry, P. Medstrand and D.Mager, published in 2001 in the journal Genomics, 76:110-116  178  A.1 Introduction The goal of this study was to test the significance of the retroelement sequences which occur in the 5' UTR of the human zinc finger gene ZNF177. This gene was originally detected as part of a screen of an NTera2Dl teratocarcinoma cell cDNA library for clones containing env-related sequences of the endogenous retrovirus family H E R V - H (Baban et al. 1996). The two cDNA clones obtained were derived from the same gene, termed ZNF177, and had only a small segment of the env sequence incorporated into the 5' U T R of the transcripts. Analysis revealed that the ZNF177 coding region corresponded to a previously unknown C2H2 zinc finger gene that appears to be widely transcribed at a low level (Baban et al. 1996). The 5' UTRs of both cDNA forms were found to be unusually long and to contain alternatively spliced exons, including the H E R V - H env segment.- Another 5' U T R exon was originally thought to be entirely derived from an Alu repeat (Baban et al. 1996), but, as shown here, recent sequencing of the genomic region has revealed that this exon is actually composed of both Alu and LI-related sequences. The unusual structure of the 5' U T R of ZNF177 raises the possibility that it could affect expression of the ZNF177 gene. Here we have investigated the significance of the different retroelements in the 5' U T R of ZNF177 and  show that they affect both the transcription and translation of reporter genes.  179  A.2 Materials and Methods A.2.1 Database searches and sequence comparison To obtain the genomic sequence of ZNF177, the high throughput genomic sequences (htgs) database at the NCBI web site was searched by B L A S T version 2.0 (Altschul et al. 1997) using the nucleotide sequences of U37251 and U37263 as queries. The resulting genomic sequence in Genbank Accession Number AC011451 was compared to the cDNAs using "BLAST 2 sequences" (Tatiana and Madden 1999) and was screened for repetitive elements using Repeatmasker with Repbase version 3.04 (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker).  The alternatively spliced 5'  UTRs of ZNF177 were searched for secondary structures using mfold version 3.1 (Mathews et al. 1999) and screened for putative transcription factor binding sites using the Transfac database version 3.3 (Schug and Overton 1997) and the Transcription Element Search System (TESS) (http://www.cbil.upenn.edu/tess/index.html).  Genes containing Alu elements  in their 5' U T R were identified by searching the U T R database UTRdb release 10.0 (Pesole et al. 1999), using the following queries: "5' UTR", "SINE/Alu" and "human".  A.2.2 Plasmid constructions The 5' UTRs of ZNF177 were amplified from Teral cDNA subclones (Baban et al. 1996) using the following primers (5'- gagaagctTGCAGCTGAGAAAGGGTTGC -3', 5'tacaagcttTCCTAAGCAGGCAGAGCCAT -3'). The oligonucleotides were derived from exons lb and 5 sequence and contained Hindlll sites (indicated in the primer sequence above in lower case) to facilitate cloning into the pGL2p (Promega) and pEGFP-Nl (Clontech) vectors. The luciferase constructs were made by introducing either of the three variant forms  180  ofthe ZNF177 5' UTRs (exon 4; exons 2 and 4; or exons 2, 3 and 4) into the Hindlll site of pGL2p. The Hindlll restriction site is situated upstream ofthe luciferase gene but downstream of the SV40 promoter in the vector. The green fluorescent protein (GFP) constructs were generated by inserting one of the alternatively spliced forms of ZNF177 (containing only exon 4) into the Hindlll site ofthe pEGFP-Nl plasmid, upstream of the GFP gene and downstream of the cytomegalovirus (CMV) promoter. Luciferase and GFP constructs with inserts introduced either in the sense or antisense orientation with respect to their orientation in ZNF177 were isolated. All constructs were sequenced to confirm orientation and sequence integrity using the pGL2p vector primers GLprimerl (5'T G T A T C T T A T G G T A C T G T A A C T G -3') and GLprimer2 (5'C T T T A T G T T T T T G G C G T C T T C C A -3') or the pEGFP-Nl sequencing primer GFP-N1 (5'C G T C G C C G T C C A G C T C G A C C A G -3').  A . 2 . 3 Transient tansfections and reporter gene assays Cos-7 cells were cultured in Dulbecco's minimal essential media supplemented with 10% fetal calf serum and antibiotics. Cells were seeded 48 h prior to transfection in 6-well plates at a density of 5 x 10 cells/well. Monolayers were transfected with 0.8 fig of plasmid 4  D N A by DEAE-dextran (Hammarskjold et al. 1986). Cells transfected with the firefly luciferase constructs were washed 48 h later in phosphate-buffered saline (PBS) and harvested in 150 ul of IX lysis buffer (Promega). Luciferase activity was assayed using the Luciferase Activity System (Promega). The data was normalized for protein using the Biorad Protein Assay and was expressed as the relative expression with respect to pGL2p. Cells transfected with the GFP constructs were washed 72 h after transfection in PBS  181  containing 2% fetal calf serum and 0.1% sodium azide and stained with 1 jug of propidium iodide. Mean fluorescence intensity was measured by flow cytometric analysis using a FACScan (Becton Dickinson). All transfections were performed a minimum of 2 times with 4 replicates in each experiment. In order to monitor the transfection efficiency of the various plasmids, a subset of luciferase constructs (#1, 3 and 4 in Fig. 3) was cotransfected (as described above) with 0.8 ag of the Renilla luciferase plasmid, p R L - C M V (Promega). Firefly and Renilla luciferase activities were quantified using the Dual-Luciferase Reporter Assay System (Promega). The co-transfection results confirmed that the different luciferase constructs had very similar transfection efficiencies with an average difference in transfection efficiency of 6% (data not shown).  A.2.4  RT-PCR Total R N A was extracted from transiently transfected Cos-7 cells using Trizol (Gibco  BRL) according to the supplier's protocol. First-strand cDNA was synthesized as previously described (Medstrand et al. 1992) using random primers, Superscript II Reverse Transcriptase (Gibco BRL) and 1.25 jug of R N A following the elimination of remaining genomic D N A with DNAse (Gibco BRL). To ensure the linearity of the PCR reactions, three dilutions of the cDNAs were amplified at various numbers of cycles, using a set of primers for the glyceraldehyde-3 -phosphate dehydrogenase  (GAPDH)  gene (5'-  C A T G A G A A G T A T G A C A A C A G C C T C -3' and 5'G T T G C T G T A G C C A A A T T C G T T G T C -3'). The signal intensity of the PCR products, after 22 cycles, indicated that the amplification reactions were in the linear range (data not  182  shown). As a result, semi-quantitative PCR was performed on one-sixteenth volume ofthe cDNA, as well as on non-reverse transcribed total RNA, to ensure the absence of contaminating genomic DNA, using the following conditions: 30 s at 95°C, 30 s at 62°C, and 30 s at 72°C for 22 cycles. After amplification using a set of primers for the luciferase gene (5'- C A G T C G A T G T A C A C G T T C G T C A C -3' and 5 'C A G A G T G C T T T T G G C G A A G A A T G -3') and for the GAPDH gene (as above), the PCR products were run on a 1.2% agarose gel and transferred to a Zetaprobe membrane (Biorad). The Southern blot was hybridized with a luciferase P-labelled oligonucleotide (5'32  G G A T C T C T G G C A T G C G A G A A T C T G -3') and subsequently a GAPDH P-labelled 32  oligonucleotide (5'- C A T G A G A A G T A T G A C A A C A G C C T C -3') in 6X standard sodium citrate (SSC), IX Denhardts (0.02% each Ficoll, polyvinylpyrrolidone, bovine serum albumin fraction V), 0.5% sodium dodecyl sulphate (SDS) at 68°C and 65°C respectively and washed for 10 min at RT in 3X SSC and 1% SDS. The intensity ofthe hybridization was measured using a phosphoimager and the ImageQuant software (Molecular Dynamics). As an internal control for the amount of RNA, the luciferase intensity of each sample was normalized to the respective intensity of the GAPDH hybridization. The data was then expressed relative to the intensity obtained with the R N A obtained from Cos-7 cells that had been transfected with pGL2p (Promega).  A.2.5 Northern blot analysis Five jug of DNAse treated R N A (extracted as above) from transiently transfected Cos-7 cells was run on a 1.2% agarose, 5% (V/V) formaldehyde, IX MOPS buffer (20 mM morpholinopropanesulfonic acid, 5 mM sodium acetate, 0.1 m M E D T A pH 8.0) gel and  183  transferred to a Zetaprobe membrane (Biorad). The Northern blot was hybridized in ExpressHyb (Clontech) at 68°C with a 427 bp Ndel-HindLU. pEGFP-Nl fragment corresponding to a segment of the 5'UTR of the GFP transcript. The membrane was washed twice for 15 min in 2X SSC, 0.05% SDS at RT followed by 2 washes of 25 min in 0.1X SSC, 0.1%) SDS at 50°C. To confirm the amounts of mRNA loaded in each lane, the blots were rehybridized with a human 1.9 kb actin cDNA fragment. The intensity of the hybridization was then normalized to that of the actin hybridization (see above). The data was expressed relative to the intensity obtained with the R N A obtained from Cos-7 cells that had been transfected with GFP-N1.  184  A.3 Results  A.3.1 Characterization of the ZNF177 genomic locus To determine the structure of the ZNF177 locus, including the location in the genome of the alternatively spliced Alu and H E R V sequences which are present in the 5' U T R of ZNF177 transcripts, we searched Genbank using ZNF177 transcripts as query sequences. The ZNF177 gene was found in a contig (Genbank Accession number AC011451) derived from chromosome 19. Analysis of the genomic sequence revealed that the gene spans 58.7 kb on 19pl3 and possesses in its full length form 9 exons, the last of which contains the zinc finger motifs (Figure A . l ) . Although very similar, the two originally identified ZNF177 cDNA clones, SB1 and SB2 (Baban et al. 1996), were found to differ in their 5'ends as well as in the sequence coding for the zinc finger genes (see Figure A . l ) . While the SB1 clone potentially codes for a 221 amino acid peptide (Baban et al. 1996), the SB2 clone contains imperfect zinc finger motifs which disrupt the open reading frame. Genomic comparison showed that the use of an alternative splice acceptor site created a larger exon 9 in the SB2 clone, leading to the incorporation of degenerate zinc finger copies and interruption of the open reading frame. Based on the genomic analysis several characteristics of the 5' U T R of ZNF177 were also identified. We found that the different first exons of the ZNF177 transcripts in clone SB1 and SB2 most likely reflect usage of alternative promoters as exon la (of clone SB2) is found 40 kb upstream of exon 2 while exon lb (of SB1) is located less than 2 kb from the second exon. In addition, we confirmed that exon 2, which is alternatively spliced and present in the majority of transcripts, is entirely derived from the H E R V - H element. At the genomic level, this partially deleted H E R V consists only of envelope and leader sequence and is integrated in the antisense orientation with respect to the  185  1 kb 9a la  lb  ALU 4 LINK  5 6 7  8  '~9r?  =  —tr  40 kb  ATG (SSSj lj2j  3  = 4  ^illlll!  5  TGA 6  7  8  9  ATG  1  =1  TGA _J 6  5  2 14  7  8  ATG  1 4  5  100 bp  TGA 6  7  8  Figure A.l Genomic organization and alternative transcripts of ZNF177. The top panel represents the gene structure of ZNF177 in which the exons are numbered from the 5' end and depicted as boxes. Exon variants specific to the SB1 clone are referred to as b while exons specific to the SB2 clone are referred to as a. The zinc finger motifs are present in exon 9; exon 9a incorporates imperfect zinc fingers while exon 9b only contains the perfect motifs. Only repetitive elements involved in exon formation are shown. H E R V sequence is • represented by a checkered motif, LI sequence by a shaded pattern and Alu sequence by horizontal lines. The bottom panel shows the three alternatively spliced 5' U T R forms of ZNF177 mRNAs, which have been identified for both SB1 and SB2 (Baban et al. 1996). A l l detected transcripts contain exon 4, composed of both Alu and LI segments. The majority of ZNF177 mRNAs also include exon 2, derived from a H E R V - H envelope gene.  186  ZNF177 gene. Finally, we found that exon 4, which is present in all three alternatively spliced transcript forms and was previously thought to contain only Alu sequence (Baban et al. 1996), is actually composed of both Alu and LI sequence. As is shown in Figure A.2, the first 74 bp of exon 4 are part of an Alu-Jo repeat (Batzer et al. 1996) while the last 34 bp are sequence from an LI element. Inspection of the genomic locus revealed that this Alu element integrated within the LI element. As is the case for the H E R V - H element, both the Alu and LI elements integrated in the opposite transcriptional direction relative to the ZNF177 gene.  A.3.2 Modification of reporter protein level by the 5' UTR of ZNF177 As mentioned above, it is known that the 5' U T R of some transcripts contribute to the regulation of gene expression. To investigate if the retroelement sequences present in the 5' U T R of ZNF177 influence gene expression, we made a variety of constructs containing the three alternatively spliced forms of the 5' UTRs identified previously (Baban et al. 1996). As indicated in Figure A. 3 (A), the 5' UTRs were inserted in both orientations downstream of an SV40 promoter but upstream from the luciferase or GFP coding sequence. The 5' UTRluciferase constructs were transiently transfected into Cos-7 cells and luciferase activity measured. The results, shown in Figures A.3(B) and (C), revealed that the repetitive elements in the 5' U T R of ZNF177 altered the expression of the reporter gene and that these differences were statistically significant (P < 0.001 by Student t-test). The data obtained with the sense constructs (+) indicated that the presence of either one of the three alternatively spliced 5' UTRs in the same orientation as present in the ZNF177 locus  187  E  X  O  N  •  4  ttgttttctgtag AGATG. . . .GATGG gtaagtac LIM--  Figure A.2 Characterization of exon  4. Fortuitous splice sites present within the Alu and  LI element permit the incorporation of exon 4 in the ZNF177 transcripts. The sequence representing the splice donor site and acceptor site is given at the intron/exon boundaries. The direction of transcription of the retroelements and of the exon is indicated by arrows. As in Figure A . l , the horizontal lines depict Alu sequence while the shaded pattern represents the LI element.  188  significantly decreased luciferase activity. Interestingly, the insertion ofthe Alu-Ll segment (exon 4) constituted a small 108 bp addition to the 5' U T R but decreased reporter gene activity by 70% compared to controls without exon 4, indicating that a determinant other than length was involved in modulating luciferase activity. The results of transfections using the antisense (-) constructs displayed an inverse correlation between the luciferase activity and the size of the insert within the 5' U T R (see Figure A.3(B)). Although the luciferase activity decreased as the insert length increased, the activity was surprisingly higher than that of the positive control (vector with no inserts in the 5' UTR) suggesting that a mechanism more complex then simply length of 5' U T R was affecting luciferase activity. To confirm the results with another assay system, we performed transfections using constructs where exon 4 sequences were fused to GFP. Supporting the luciferase data, the presence ofthe A l u / L l (exon 4) segment in the 5' U T R of the GFP plasmid reduced the expression of the reporter gene when present in the same orientation as in ZNF177 (Figure A . 3 (C)).  A.3.3 Alteration of reporter RNA level by ZNF177 5' UTR The decrease in overall reporter gene expression observed with constructs containing either of the three alternatively spliced 5' UTRs in the same orientation as they are found in  ZNF177 could be caused by transcriptional or post-transcriptional changes. To determine if the ZNF177 retroelements modulate the protein or the R N A levels of reporter genes, we quantified the R N A levels of luciferase or GFP in cells that had been transiently transfected with the 5' U T R reporter constructs. Because of the short half-life of luciferase transcripts  189  HERV  ALU/LINE  0.5  1.0  1.5  2.0  Relative luciferase activity  Figure A.3 Effect of the 5' UTRs of ZNF177 on protein levels of reporter genes. (A) Schematic representation of the ZNF177-5' U T R reporter constructs. The control plasmid is the vector pGL2p or pEGFP-Nl without any inserts. The alternatively spliced 5' UTRs of ZNF177 are ligated in the Hindlll (Ff) site of the control plasmid, downstream of the SV40 promoter (P). H E R V sequence (exon 2) is represented by a checkered motif, LI sequence (exon 4) by a shaded pattern and Alu sequence (also in exon 4) by horizontal lines while the empty box represents exon 3. The transcription initiation site is shown by an arrow. (B) Functional analysis of the ZNF777-5' U T R luciferase constructs in Cos-7 cells. Plasmids were tested with the 5' UTRs inserted in the same or opposite orientation as indicated by a + or - sign. The relative luciferase activity, normalized to protein concentration, is shown for each corresponding construct. The activity of the control vector, pGL2p, was set to 1. The data represents the average luciferase activity plus the standard deviation of two or more separate transfections each performed in quadruplicate.  190  0.5  1.0  1.5  2.0  Relativefluorescenceintensity  (C) Confirmation of reporter protein level modification by transfection with ZNF177-5' U T R GFP constructs. The data is expressed as in panel B with the exception that the control vector is pEGFP-Nl and the bars represent the relative GFP intensity observed.  191  (Valhmu et al. 1998) we were unable to perform Northern analysis but instead relied on semi-quantitative RT-PCR (see Materials and Methods) to compare R N A levels. As is shown in Figures A.4 (A) and 4 (B), the presence of the two smaller 5' UTRs in the same orientation as they are present in ZFN177 augmented the quantities of luciferase RNA. This increase in R N A levels also occurred for two of the three UTRs when present in the opposite orientation suggesting that the 5' U T R sequence might contain an element that enhances transcription or mRNA stability in an orientation independent manner. As a control for the luciferase R N A levels, we also performed Northern analysis on R N A extracted from cells that had been transfected with the 5' U T R - G F P constructs (Figures 4 (C) and 4 (D)). As observed for the luciferase constructs, the presence of exon 4 from ZNF177 increased the measured GFP R N A levels in an orientation independent manner. The accumulation of luciferase and GFP R N A detected in cells transfected with the 5' U T R constructs could result from the retroelement sequences enhancing reporter gene transcription or augmenting mRNA stability. To distinguish between these two mechanisms, we compared the half-life of luciferase mRNA in cells that that had been transiently transfected with exon 4-luciferase constructs or the control construct and treated with 10 ug/ml of the transcription inhibitor, actinomycin D. This experiment showed that the presence of exon 4 in the luciferase transcripts did not significantly affect the half-lives of the reporter mRNAs (data not shown). Thus, the increase in luciferase mRNA levels is not the result of altered mRNA stability but is likely the consequence of enhanced transcription.  192  1  2  0.5  3  1.0  4  5  6  7  8  1.5  2.0  2.5  3.0  3.5  4.0  Relative luciferase RNA levels  Figure A.4 Effects of ZNF177 5' UTRs on reporter R N A levels. (A) Luciferase and GAPDH RT-PCR products from the mRNAs of cells transfected with ZNF177-V U T R luciferase constructs, as previously described and diagrammed in Figure A.3. (B) Quantification of luciferase RT-PCR products (in part A). Results are shown as the average of two separate experiments plus the standard deviation and normalized by the hybridisation of GAPDH RT-PCR products.  193  c  D  ( C ) Northern blot of GFP transcripts from cells transfected with ZNF177-5' U T R GFP constructs. (D) Quantification of Northern (in part B) normalized to actin hybridisation intensity.  194  A.4 Discussion The results presented here suggest that repetitive sequences present within the 5' U T R of ZNF177 transcripts affect the regulation of this gene. Our studies indicate that insertion of the retroelement sequences, particularly the Alu-Ll segment (exon 4), in the 5' U T R of reporter genes modifies their R N A and protein levels. We have shown that the presence of exon 4 in either orientation in the 5' U T R of reporter genes results in the accumulation of luciferase and GFP mRNA but that changes in mRNA stability do not account for this increase in transcript levels. Furthermore, we have found that these retroelement sequences reduce luciferase and GFP protein levels when present in the same orientation as found in ZNF177. Thus, the Alu-Ll sequence in the 5' U T R of ZNF177 likely modulates gene expression by increasing transcription while impeding translation. Although non-repetitive sequences present in the 5' UTR of some transcripts have previously been shown to contribute to the regulation of gene expression (Wilhide et al. 1997; Kanaji et al. 1998; Valhmu et al. 1998), this study is, to our knowledge, the first to report that retroelement sequences within the 5' UTR of mRNAs can influence levels of gene expression. The 5' UTRs of genes are believed to contribute to gene regulation primarily by modulating the initiation of mRNA translation. The efficiency of translation of transcripts has been shown to depend on the structure of the 5' UTR. The presence of a long 5' UTR with secondary structures can impede the scanning ofthe ribosome and result in a decrease in translation efficiency (Kozak 1991). The presence of A T G codons upstream ofthe genuine initation codon can also be detrimental to translation as initiation will take place at this "false" start codon but most likely result in truncated proteins due to the presence of  195  downstream terminator codons (Kozak 1995). Because the three alternatively spliced 5' UTRs of  ZNF177 decrease the protein levels of reporter genes, we searched for attributes  within the untranslated sequences that could explain these observations. Using mfold (Mathews et al. 1999) we identified a stem-loop structure with a free energy of-53.3 kcal/mol in the smaller alternatively spliced form ofthe 5' UTRs which consists of exons 1 and 4. Since Alu repeats are derived from the 7SL R N A component of the signal recognition particle (SRP) (Ullu et al. 1982), it is possible that the Alu segment of exon 4 might form secondary structures similar to the determined conformation of the 7SL R N A (Weichenrieder et al. 2000). If so, the Alu sequence in the 5' U T R of ZNF177 could arrest translational elongation in a similar fashion to that proposed for the Alu domain of SRP. In addition to the potential secondary structures, we found that exon 4 contains two additional ATGs, as is shown in Figure A. 5, upstream of the presumed true initiation codon located in exon 5, but that these two ATGs are closely followed by stop codons. It is possible that translation is being initiated in exon 4 but quickly terminated due to the presence ofthe in frame stop codons which could account for the observed decrease in translation efficiency of constructs containing exon 4 in their 5' UTR. In addition to regulating translation, the 5' UTRs of some genes have also been shown to influence transcription by providing binding sites for transcription factors (Wilhide et al. 1997; Valhmu et al. 1998). We therefore analyzed the sequence ofthe 5' UTRs of  ZNF177 for putative transcription factor binding sites (TESS). Examination of the repetitive sequences revealed the presence of two non-canonical Spl binding sites in the Alu sequence of exon 4 (see Figure A.5) which were perfect matches to a 10 base pair consensus binding  196  A GAG A T G CGG TCT TGC TGT GTT GCC TAG GCT GGT CTC AAA CTC CTG CTC TCA AGT GAT CCT CCT GCC TCA GCC TCC 'TGA GTA CAT TTA TAT TTA AAG TAA TTA TTG A T G G  Figure A . 5 Sequence of exon 4. The Alu segment in exon 4 is enclosed by the dashed box while the rest of the sequence is part of the LI element. Potential initiation codons (ATG) are shown in bold and stop codons in italics. Putative Spl binding sites are underlined.  197  site previously reported for Spl (Briggs et al. 1986). Since Spl factors have been shown to cooperatively stimulate transcription (Segal and Berk 1991) and to enhance in an orientation independent manner (Kadonaga et al. 1986), the presence of the two putative Spl binding sites, one in each transcriptional direction, in the Alu sequence of exon 4 could account for the significant increase in R N A levels observed when exon 4 is inserted in either direction of transcription in the 5' U T R of reporter genes. However, the potential involvement of Spl has not yet been experimentally tested. The presence of putative transcription factor binding sites within an Alu element as well as the observation that this repetitive sequence provides motifs which can enhance or suppress the expression of cellular genes, raises important questions regarding the biological impact and prevalence of such effects. While our experiments suggest that the retroelement sequences in ZNF177 enhance transcription, it is difficult to assess the consequence of these elements in vivo. The majority of CpG dinucleotides in Alu repeats are methylated in adult tissue (Schmid 1991; Hellman-Blumberg et al. 1993) and since it is well accepted that methylation represses transcription (Kass et al. 1997), it is possible that the repetitive sequences in the  ZNF177 locus do not enhance transcription in their natural context.  However, as it has been shown that methylation of CpG sites downstream of promoter regions does not suppress transcription (Jones 1999), the potential enhancing effects of the Alu repeat, which is located over 2 kb from the closest ZNF177 promoter, are probably not blocked by methylation in vivo. If our experimental data accurately reflects the effects of the repetitive sequences in vivo, the conflicting influence of the Alu segment on transcription and translation is intriguing. It is tempting to speculate that such competing mechanisms might have resulted in relatively little change in gene expression upon insertion of the Alu. This  198  could explain why the Alu element was retained instead of being selected against during evolution. Recently published analyses ofthe whole human genome sequence indicates the presence of approximately 1.6 million copies of Alu repeats (Lander et al. 2001). Based on their copy number, Alu sequences are expected to occur, on average, once every 2.0 kb of nuclear D N A but their distribution within the human genome is not homogeneous. While the frequency of Alu elements is low in protein-coding regions of transcripts as expected, these repetitive elements are found at a much higher incidence in the untranslated regions of fully spliced mRNAs (Makalowski et al. 1994; Yulug et al. 1995). It has been estimated, on the basis on a 1995 database containing 1600 complete cDNAs, that Alu elements are present in 0.7% of 5' UTRs and 4.1% of 3' UTRs of human transcripts (Yulug et al. 1995). However, our search of a 5' and 3' untranslated region database suggests that these values might be an underestimate.  We have screened the non-redundant human U T R database (UTRdb) for Alu  elements and have identified that 4% (271 out of 6669) of 5' UTRs and 21% (1572 out of 7503) of human 3' UTRs in this database contain Alu sequences. The discrepancy in the frequency of Alu elements present within the non-coding sequence of transcripts between our findings and those of Yulug et al. (1995) most likely results from differences in search parameters or thresholds as the previous study only used two Alu sequences for the detection of repetitive sequences(Yulug et al. 1995) while our 5' UTRdb search relied on RepeatMasker for the identification of Alu elements. Although using different search criterias, both estimates suggest that a significant proportion of fully spliced transcripts contain repetitive elements in their 5' UTR.  199  Table A.l A selection of genes that contain Alu elements, similar to ZNF177, within their 5' UTRs. Gene  Accession #  ATG  Spl  intercellular adhesion molecule 1  U50463  2  0  regulator of G-protein signalling 9  AF073710  1  1  U44029  1  1  AF009010  2  1  leukotriene B4 receptor  D89078  2  2  transforming growth factor 6 II R  D50682  0  1  cathepsin B hyaluronidase 4  The presence and number of initiation codons or non-canonical Spl binding sites within the Alu elements is indicated for each gene. The Alu repeats included in this table are at least 75% identical to the Alu segment present in the 5'UTR of ZNF177.  200  Detailed sequence analysis of numerous Alu repeats and consensus sequences has revealed the presence of several transacting motifs within these elements (Tomilin 1999). Specifically, Spl binding sites have been detected in 14% of all Alu elements examined (Tomilin 1999). Our sequence analysis revealed that, in addition to harbouring c/'s-elements for Spl factors, Alu segments present at the 5' end of transcripts also frequently contain A T G codons and may form secondary structures. A list of known genes harbouring Alu elements within their 5' U T R which contain putative Spl binding sites and/or an initiation codon are shown in Table A. 1. Together, these observations suggest that a regulatory function might be common to many Alu segments integrated in the 5' U T R of genes. Indeed, if 4% of the 5' UTRs of the estimated 30 000 - 35 000 (Lander et al. 2001) human genes harbour Alu elements, it is quite possible that the expression levels of several hundred genes are affected by these Alu insertions.  201  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0099727/manifest

Comment

Related Items