Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Regulatory elements within repeated elements : a case study of NAIP transcriptional innovation Romanish, Mark Taras 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2009_fall_romanish_mark.pdf [ 5.4MB ]
Metadata
JSON: 24-1.0067520.json
JSON-LD: 24-1.0067520-ld.json
RDF/XML (Pretty): 24-1.0067520-rdf.xml
RDF/JSON: 24-1.0067520-rdf.json
Turtle: 24-1.0067520-turtle.txt
N-Triples: 24-1.0067520-rdf-ntriples.txt
Original Record: 24-1.0067520-source.json
Full Text
24-1.0067520-fulltext.txt
Citation
24-1.0067520.ris

Full Text

REGULATORY ELEMENTS WITHIN REPEATED ELEMENTS: A CASE STUDY OF NAIP TRANSCRIPTIONAL INNOVATION by MARK TARAS ROMANISH B.Sc.(Hon.), The University of Western Ontario, 2003  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Genetics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2009 ! Mark Taras Romanish, 2009  Abstract Neuronal Apoptosis Inhibitory Protein (NAIP, also known as BIRC1) is a member of the conserved Inhibitor of Apoptosis Protein (IAP) family. However, it is no longer principally considered an apoptosis inhibitor since its domain structure and functions in innate immunity also warrant inclusion in the Nod-Like Receptor (NLR) superfamily. Lineage-specific rearrangement and expansion of this locus have yielded different copy numbers among primates and rodents, providing an interesting case study in which to study transcriptional regulatory changes by a rapidly evolving gene. In the first stage of my thesis, I show that NAIP has multiple promoters sharing no similarity between human and rodents. Moreover, I demonstrate that multiple, domesticated long terminal repeats (LTRs) of endogenous retroviral (ERV) elements provide NAIP promoter function in human, mouse and rat. In human, an LTR serves as a tissue-specific promoter active primarily in testis. However, in rodents, our evidence indicates that an ancestral LTR common to all rodent genes is the major, constitutive promoter for these genes and that a second LTR found in two of the mouse genes is a minor promoter. Thus, independently acquired LTRs have assumed regulatory roles for orthologous genes, a remarkable evolutionary scenario. It is also demonstrated that 5’ flanking regions of IAP family genes as a group, in both human and mouse, are enriched for LTR insertions compared to average genes. In the second stage of my thesis, I demonstrate that several of the human NAIP paralogues are expressed, and that novel transcripts arise from both internal and upstream transcription start sites. Remarkably, two internal start sites initiate within Alu short interspersed element (SINE) retrotransposons, and a third novel transcription start site exists within the final intron of the GUSBP1 gene, upstream of only two NAIP copies. One Alu functions alone as a promoter in transient assays, while the other likely combines with upstream L1 sequences to form a composite promoter. The novel transcripts encode shortened ii  open reading frames and I show that corresponding proteins are translated in a number of cell lines and primary tissues, in some cases above the level of full length NAIP. Interestingly, some NAIP isoforms lack their caspase-sequestering motifs, indicating that they have novel functions. My results support an important role for transposable elements in NAIP evolution, particularly as transcriptional regulatory modules, and illustrate a fascinating example of regulatory innovations adopted by a rapidly evolving gene.  iii  Table of Contents Abstract .........................................................................................................................................ii Table of Contents......................................................................................................................... iv List of Tables ..............................................................................................................................vii List of Figures ............................................................................................................................viii List of Abbreviations ................................................................................................................... ix Acknowledgements...................................................................................................................... xi Co-authorship statement .............................................................................................................xii Chapter 1 – Introduction ............................................................................................................... 1 1.1 – Genomic transposable elements..................................................................................... 2 1.2 – Types of repeats in the human genome.......................................................................... 2 1.2.1 – Short interspersed elements (SINEs) .................................................................... 3 1.2.2 – Long interspersed elements (LINEs) .................................................................... 5 1.2.3 – Endogenous retroviral elements/LTR retrotransposons........................................ 6 1.3 – Life cycle of LTR and non-LTR retrotransposons ........................................................ 8 1.3.1 – LTR retrotransposon life cycle ............................................................................. 8 1.3.1.1 – Current activity of human and mouse ERVs/LTR retrotransposons ........... 9 1.3.2 – Non-LTR retrotransposon life cycle ................................................................... 10 1.3.2.1 – Current activity of human and mouse non-LTR retrotransposons............. 11 1.4 – Distribution of human and mouse genomic retrotransposons...................................... 11 1.4.1 – Distribution of LTRs in human and mouse genomes ......................................... 12 1.4.2 – Distribution of LINEs in human and mouse genomes........................................ 12 1.4.3 – Distribution of SINEs in human and mouse genomes ........................................ 14 1.5 – Effects of transposable elements in host genomes....................................................... 14 1.5.1 – TE-mediated genomic rearrangements ............................................................... 15 1.5.2 – TE-mediated evolution of new genes ................................................................. 16 1.5.3 – Exonized transposable elements within proteins ................................................ 17 1.5.4 – TE-mediated regulatory innovation of host genes .............................................. 19 1.5.4.1 – Regulatory effects - Transcriptional termination....................................... 19 1.5.4.2 – Regulatory effects - Transcriptional elongation ........................................ 20 1.5.4.2.1– Domestication of LTRs as promoters of host genes .......................... 21 1.5.4.2.2 – Domestcation of LINEs as promoters of host genes......................... 22 1.5.4.2.3 – Domestication of SINEs as promoters of host genes........................ 23 1.6 – A case study of TE-mediated regulatory innovation by a mammalian gene ............... 24 1.6.1 – NAIP domain structure........................................................................................ 24 1.6.2 – NAIP chromosomal arrangement........................................................................ 26 1.6.3 – NAIP function ..................................................................................................... 28 1.6.4 – NAIP expression.................................................................................................. 29 1.7 – Thesis objectives.......................................................................................................... 30 1.8 – References.................................................................................................................... 32 iv  Chapter 2 - Repeated recruitment of LTR retrotransposons as promoter by the NAIP genes during mammalian evolution................................................................................... 46 2.1 – Introduction.................................................................................................................. 47 2.2 – Materials and methods ................................................................................................. 48 2.2.1 – RNA isolation ..................................................................................................... 48 2.2.2 – 5’ rapid isolation of cDNA ends (5’ RACE) ...................................................... 48 2.2.3 – Genomic PCR and generation of constructs ....................................................... 48 2.2.4 – Cell culture and luciferase assays ....................................................................... 49 2.2.5 – cDNA synthesis and RT-PCR............................................................................. 49 2.2.6 – Quantitative RT-PCR.......................................................................................... 50 2.2.7 – Sequencing.......................................................................................................... 50 2.2.8 – Dotplots............................................................................................................... 51 2.2.9 – Analysis of retroelements in 5’ flanking regions................................................ 51 2.3 – Results.......................................................................................................................... 53 2.3.1 – Transcription of mammalian NAIP genes initiates within LTRs........................ 53 2.3.2 – Tissue distribution of NAIP expression .............................................................. 58 2.3.3 – Promoter activity of the ORR1E LTRs............................................................... 60 2.3.4 – Rapid evolution of the NAIP promoter regions .................................................. 61 2.3.5 – Retroelement prevalence in IAP gene 5’ flanking regions ................................. 64 2.4 – Discussion .................................................................................................................... 67 2.5 – References.................................................................................................................... 75 Chapter 3 – Novel protein isoforms of the multicopy NAIP genes derive from intragenic Alu promoters.......................................................................................................... 80 3.1 – Introduction.................................................................................................................. 81 3.2 – Materials and methods ................................................................................................. 82 3.2.1 – Ethics statement .................................................................................................. 82 3.2.2 – RNA and reverse transcription............................................................................ 82 3.2.3 – RT-PCR............................................................................................................... 82 3.2.4 – 5’ rapid amplification of cDNA ends (5’ RACE)............................................... 83 3.2.5 – Quantitative RT-PCR.......................................................................................... 83 3.2.6 – Generation of constructs ..................................................................................... 84 3.2.7 – Cell culture and transient transfection ................................................................ 85 3.2.8 – Reporter gene assays........................................................................................... 86 3.2.9 – Western blotting.................................................................................................. 86 3.2.10 – Computational tools .......................................................................................... 87 3.3 – Results.......................................................................................................................... 88 3.3.1 – Human NAIP is a multicopy gene....................................................................... 88 3.3.2 – Novel human NAIP transcription start sites........................................................ 89 3.3.3 – Promoter activity of proximal NAIPSg and NAIPJb sequences............................ 92 3.3.4 – Variable contribution of Alu-associated NAIP transcripts in different tissues.................................................................................................................. 93 3.3.5 – Full-length Alu-derived transcripts are broadly expressed................................. 95 3.3.6 – Novel human NAIP protein isoforms ................................................................. 96 3.3.7 – NAIP protein isoforms are broadly expressed in human tissues ........................ 98 3.4 – Discussion .................................................................................................................. 100 3.5 – References.................................................................................................................. 105 v  Chapter 4 – Significance and outstanding issues...................................................................... 110 4.1 – Role of TEs in NAIP gene evolution.......................................................................... 111 4.2 – Unequal exchange and the emergence of new genes................................................. 111 4.3 – Genomic recycling and the assembly of new genes .................................................. 113 4.4 – One genome’s trash is another’s treasure .................................................................. 114 4.5 – Two outstanding issues .............................................................................................. 116 4.5.1 – Transcription of NAIP orthologues................................................................... 116 4.5.2 – Experimentation with TEs by amplified genes................................................. 117 4.6 – Broad implications of these findings ......................................................................... 118 4.6.1 – TEs and transcriptional networks...................................................................... 119 4.6.2 – Role of TEs in facilitating proteome diversification......................................... 120 4.7 – Concluding remarks ................................................................................................... 121 4.8 – References.................................................................................................................. 123 Appendix A – Chapter 2 supplementary figures and tables ..................................................... 128 Appendix B – Chapter 3 supplementary figures and tables...................................................... 134  vi  List of Tables Table 1.1 – TE types, copy numbers, and genomic coverage in human and mice ....................... 3 Table A.1 – LTR insertions within the analyzed windows for all human and mouse IAP genes...................................................................................................................... 131 Table A.2 – Primers and associated information ...................................................................... 133 Table B.1 – Primers used in this report..................................................................................... 142  vii  List of Figures Figure 1.1 – The structure of a typical integrated Alu SINE ........................................................ 4 Figure 1.2 – The structure of a typical integrated SVA SINE ...................................................... 5 Figure 1.3 – The structure of a typical integrated L1 LINE ......................................................... 6 Figure 1.4 – The structure of a typical ERV/LTR retrotransposons............................................. 7 Figure 1.5 – Recruitment of TEs into coding regions of host genes........................................... 18 Figure 1.6 – TE-mediated aberrant poly-adenylation of host genes........................................... 20 Figure 1.7 – Recruitment of TEs as promoters of host genes..................................................... 21 Figure 1.8 – NAIP protein domains and familial designations................................................... 25 Figure 1.9 – Genomic organization of the human and mouse NAIP copies ............................... 27 Figure 1.10 – Dot plot comparing human and mouse NAIP upstream regions .......................... 28 Figure 2.1 – Contribution of LTR promoters to human NAIP transcription and a summary of 5’ RACE results................................................................................................. 55 Figure 2.2 – Contribution of LTR promoters to mouse and rat Naip transcription and a summary of 5’ RACE results................................................................................. 57 Figure 2.3 – Transcriptional profile of human (A), mouse (B), and rat (C) NAIP across the indicated primary tissues........................................................................................ 59 Figure 2.4 – Promoter activity of the mNaip LTRs .................................................................... 61 Figure 2.5 – Association of LTR elements with NAIP through mammalian evolution.............. 62 Figure 2.6 – Comparison of genomic sequence surrounding the rodent Naip ORR1E LTRs ...................................................................................................................... 64 Figure 2.7 – Density of TE sequence in 5' flanking regions of IAP genes compared to random gene sets.................................................................................................... 66 Figure 3.1 – Expression of predicted NAIP copies in the sequenced human genome................ 89 Figure 3.2a – Identification of novel NAIP transcription start sites ........................................... 91 Figure 3.2b – Identification of novel NAIP transcription start sites ........................................... 93 Figure 3.3 – Contribution of Alu-initiated isoforms to total NAIP transcription........................ 94 Figure 3.4 – Expression of full-length NAIPJb transcripts across many tissues ......................... 96 Figure 3.5 – Detection of novel NAIP protein isoforms............................................................. 98 Figure 3.6 – Expression of NAIP protein isoforms in primary human tissues ........................... 99 Figure A.1 – Analysis of human NAIP 5’ UTR and coding region splice isoforms................. 129 Figure A.2 – Analysis of mNaip 5’ UTR and coding region splice isoforms........................... 130 Figure B.1 – Homology of human NAIP copies ....................................................................... 135 Figure B.2 – Unequal levels of NAIP 5’ and 3’ transcription................................................... 135 Figure B.3 – Analysis of NAIPfull transcription ........................................................................ 136 Figure B.4 – Sequence analysis underlying NAIP transcription start sites for the novel NAIPSg (A), NAIPJb (B), and NAIPGUSBP1 (C) regulatory regions........................ 137 Figure B.5 – Broad transcription of novel NAIP isoforms ....................................................... 139 Figure B.6 – NAIP protein sequence and encoded domains .................................................... 140  viii  List of Abbreviations ALV BIR BIRC1 CAGE cDNA cIAP CNV CpIAP DNA env ERV gag gDNA HA HIV IAP IPAF INT LINE L1 LRR LTR MaLR MIR mNaip ML-IAP MLV MMTV mRNA MTC NAIP NAIPERV-P NAIPfull NAIPJb NAIPSg NAIP1 NAIP2 "NAIP NBD NOD NLR OpIAP ORF ORR1E P  avian leukemia virus baculoviral IAP repeat baculoviral IAP repeat containing 1 cap analysis of gene expression complementary DNA cellular IAP copy number variation Cydia pomonella IAP deoxyribonucleic acid envelope endogenous retrovirus group specific antigen genomic DNA hemagglutinin human immunodeficiency virus inhibitor of apoptosis ice protease activating factor integrase long interspersed nuclear element long interspersed nuclear element 1 leucine rich repeat long terminal repeat mammalian apparent LTR retrotransposon mammalian interspersed repeat mouse Naip melanoma IAP moloney murine leukemia virus mouse mammary tumour virus messenger RNA mouse transcript Neuronal apoptosis inhibitory protein LTR-transcribed human NAIP full-length NAIP AluJb-derived human NAIP AluSg-derived human NAIP centromeric NAIP telomeric NAIP pseudogene NAIP nucleotide binding domain nuclear oligomerization domain Nod-like receptor Orgyia pseudotsugata IAP open reading frame origin region repeat 1E promoter ix  PCR PBL pol pol II pol III poly-A pro RACE RNA rNaip RT SD SINE SMA SRP SVA TE TF TLR TPRT tRNA TSD Ts-IAP TSS UCSC UTR VLP XIAP  polymerase chain reaction peripheral blood leukocytes polymerase RNA polymerase II RNA polymerase III poly-adenylation protease rapid amplification of cDNA ends ribonucleic acid rat Naip reverse transcriptase segmental duplication short interspersed nuclear element spinal muscular atrophy signal recognition particle SINE-R, VNTR, Alu transposable element transcription factor toll-like receptor target primed reverse transcription transfer RNA target site duplication testis specific IAP transcription start site University of California Santa Cruz untranslated region virus-like particle X-linked IAP  x  Acknowledgements Reflecting on my time as a graduate student, which has now come to a successful conclusion, it is obvious that I am greatly indebted to many people. Foremost, I wish to offer sincere gratitude to my supervisor Dr. Dixie Mager who accepted me as a student and fostered my abilities as a researcher. I truly appreciate the freedom that I was afforded to pursue research questions, but more so her availability to discuss new findings and to provide direction when my work was losing its focus. Of the many things that she has taught me, one that stands out is the ability to critically assess the merits of a particular hypothesis and to subsequently chart a course of action that most directly addresses it. This is a skill I will utilize for the rest of my life, regardless of where my career path takes me. Thank you. Secondly, I am grateful to current and former lab members, as well as others I have befriended while at the Terry Fox Laboratory, for their technical assistance and helpful discussions. Aside from work related matters, these are all individuals who provided me a sense of comfort and allowed me to quickly adjust to my new surroundings. I will forever remember you as friends and am eager to keep contact to see how your futures unfold. Thirdly, I wish to thank an extra special person, Hisae Nakamura, for coming into my life at the right time, if not completely by chance. You have been a very stabilizing influence, and have helped me in more ways than you know to achieve this exciting milestone. I look forward with earnest to what comes next, and I can only hope to be as helpful to you when your thesis is in its final stages…soon! Finally, and perhaps most importantly, I would like to thank my family whose unwavering support through all of my pursuits is truly a blessing. Often I catch myself wondering where I might have ended up had it not been for their unconditional love and the values they have instilled within me. Everything that I have and will achieve, I dedicate to you. xi  Co-authorship statement  Chapter 2: M.T. Romanish conducted all experiments reported in this chapter, except the analysis discussed in 2.3.5, with technical mentorship and guidance provided by Dr. C.A. Dunn and Dr. D.L. Mager. The results represented in Section 2.3.5 were obtained using a script written and executed by W.M. Lock, and W.M. Lock and M.T. Romanish jointly generated the corresponding Figure 2.7. L.N. van de Lagemaat discovered NAIP as a potential example of LTR promoter domestication by a host gene in his unpublished observations. M.T. Romanish drafted the manuscript, with advisory assistance from D.L. Mager.  Chapter 3: M.T. Romanish designed and conducted all experiments reported in this chapter, except for the assistance from C.B. Lai in elucidating the preferential utility of an AP-1 binding motif in the AluJb element, which adheres more closely to the derived consensus sequence for this transcription factor than does the AluSg element. H.N. Nakamura provided mentorship in initial Western blot experiments, using materials and reagents obtained from the Y.Z. Wang lab. M.T. Romanish drafted the manuscript, with advisory assistance from D.L. Mager.  xii  CHAPTER 1 INTRODUCTION  1  1.1 Genomic transposable elements The concept that genome size does not correlate with the complexity of an organism predated the advancement of whole-genome sequencing technology (Gregory, 2001). This apparent ‘C-value paradox’, where ‘C’ represents the amount of DNA in the haploid genome, is strikingly exemplified by the >105-fold difference in genome size among eukaryotes (Gregory, 2001). However, increased C-value does correlate with an increase in the copy number of genomic transposable elements (Hua-Van et al., 2005; Kidwell, 2002). Transposable, or transposed, elements (TEs) have been identified in all eukaryotic species studied (Dewannieux and Heidmann, 2005; Kidwell and Lisch, 2000). TEs are broadly classified as the class 1 retrotransposons and the class 2 DNA transposons. The retrotransposons proliferate via a ‘copy and paste’ mechanism that utilizes a reverse transcribed RNA intermediate, while the DNA transposons are mobilized by a ‘cut and paste’ mechanism, whereby the element is excised and then re-inserts at a new locus.  1.2 Types of repeats in human and mouse genomes The initial sequence of the human (Lander et al., 2001) and mouse (Waterston et al., 2002) genomes revealed that TEs comprise approximately half of these genomes. On the one hand, the majority of human and mouse TEs belong to the class 1 elements, which are further divided into the three main types, namely the short interspersed elements (SINEs), long interspersed elements (LINEs), and endogenous retroviral (ERV) long terminal repeats (LTRs). One interesting family of SINE, the composite SVAs (for SINE, VNTR, Alu), has arisen only within the genomes of hominids (Goodier and Kazazian, 2008; Wang et al., 2005). The SINEs, LINEs, and LTRs occupy comparable fractions of the human and mouse genomes (Lander et al., 2001; Waterston et al., 2002) (Table 1.1), and are discussed in greater detail below. On the other hand, the class 2 DNA transposons occupy roughly 1% and 3% of the mouse and human 2  genomes, respectively (Lander et al., 2001; Waterston et al., 2002). With the exception of bats (Ray et al., 2007) and a new world primate (Pace et al., 2008), no active DNA transposons are known to exist in mammals, and therefore will not be discussed further. However, in other species such as Drosophila, Class 2 elements belong to numerous families and many of these are highly mutagenic, most notably the P-element (Pinsker et al., 2001). Table 1.1 TE types, copy numbers, and genomic coverage in human and mouse Thousands of copies (human)a  Fraction of genome (%)a  Thousands of copies (mouse)b  Fraction of genome (%)b  1,561 1,090 468 3c  13.25 10.6 2.5 0.15c  1,498 564 115 348 391 79 -  8.22 2.66 0.57 2.39 2.36 0.25 -  LINEs L1 L2 L3  868 516 315 37  20.42 16.89 3.22 0.31  660 599 53 8  19.2 18.78 0.38 0.05  LTRs Class I Class II (ERV-K) Class III (ERV-L) MaLR  443 112 8 83 240  8.29 2.89 0.31 1.44 3.65  631 34 127 388 112  9.87 0.68 3.14 0.58 4.82  294  2.84  112  0.88  SINEs Alu/B1 MIR B2 B4 ID SVA  DNA Transposons a  b  c  Taken from (Lander et al., 2001); taken from (Waterston et al., 2002); taken from (Wang et al., 2005).  1.2.1 Short interspersed elements (SINEs) Typical SINEs range from ~80-400 bp long  and are found in mammals through  protozoans (Ohshima and Okada, 2005). All SINEs derive from cellular small RNAs, and most have a common origin from tRNAs (Ohshima and Okada, 2005; Smit, 1999). Interestingly, in the rodent and primate lineages, a new family of SINE emerged, B1 and Alu elements, respectively (Figure 1.1). Both B1 and Alu are unique from other SINEs, in that they derive from 7SL RNA of the signal recognition particle (Quentin, 1994). Alus consist of two 7SL monomers linked by 3  an A-rich segment, whereas the B1s are not dimerized (Quentin, 1992; Schmid, 1996). Like their predecessors, all SINEs are transcribed by RNA polymerase III from internal promoters containing an A and B box (Tomilin, 2008), and are non-autonomous since they do not encode any open reading frames (Dewannieux and Heidmann, 2005).  TSD  A B (A)n left monomer  right monomer  (A)n TSD  Figure 1.1 The structure of a typical integrated Alu SINE. Two 7SL monomers dimerized, and underwent some slight modification. The A and B boxes denoting the RNA polymerase III promoter are only present in the left monomer, and the shaded box is deleted in the left monomer. Terminal arrows represent target site duplications (TSDs) generated upon integration into genomic DNA.  A lack of coding potential has not hindered their ability to retrotranspose; indeed, SINEs have been tremendously efficient in this regard and have amplified to ~1.5 X 106 copies in each of the human and mouse haploid genomes (Lander et al., 2001; Waterston et al., 2002) (Table 1.1). In fact, Alu elements alone contribute more than one million copies and represent the most abundant elements in the human genome, while the older mammalian interspersed repeat (MIR) SINEs constitute the remainder (Lander et al., 2001). Mouse SINEs are classified into 5 distinct subfamilies: the B1, B2, B4, ID, and MIR subfamilies. The most plentiful of these are the B1 at >5 X 105 copies, followed by <4 X 105 B2 and B4s, and >1 X 105 MIR elements (Waterston et al., 2002) (Table 1.1). Several lines of evidence indicate that SINEs rely on the proteins expressed by other autonomous elements within the host genome for their proliferation. It has been directly shown that LINE proteins can mobilize marked Alu elements in cell lines (Dewannieux et al., 2003). Moreover, the existence of SINEs with homology to a particular subfamily of LINE in their 3’ terminus provides further evidence of this parasitic interdependence (Ohshima and Okada, 2005).  4  Although classified SINEs, the hominid-specific SVAs are a composite element with variable number tandem repeats (VNTRs) flanked by a partial Alu and human ERV-K (HERVK) LTR at its 5’ and 3’ termini, respectively (Goodier and Kazazian, 2008) (Figure 1.2). SVAs are highly polymorphic in the human genome, and are represented by approximately 3000 copies (Wang et al., 2005). It has been speculated that they may be ‘the next big thing’ in terms of human genomic repeats (Goodier and Kazazian, 2008).  (CCCTCT)n Alu-like V N T R HERV-K TSD  AAAAAA TSD  Figure 1.2 The structure of a typical integrated SVA SINE. The composite SVA element ranges from 2-4 kb long, and comprises an Alu fragment and HERV-K LTR connected by variable number tandem repeats (VNTR). An integrated element is flanked by target site duplications (TSD).  1.2.2 Long interspersed elements (LINEs) The LINEs are also present in all eukaryotes, and their emergence necessarily predated that of the LINE-dependent SINEs (Dewannieux et al., 2003). Furthermore, the origin of LINEs is unclear but it has been proposed to be earlier than LTR retrotransposons and ERVs (Malik and Eickbush, 2001); since only non-LTR retrotransposons are found in trypanosomes, an ancient eukaryotic species (Boeke and Stoye, 1997). An integrated full-length LINE element is ~6 kb, encodes two open reading frames, ORF1 and ORF2, and is flanked by AT-rich target site duplications of up to 20 bp (Dewannieux and Heidmann, 2005) (Figure 1.3). Transcription of the polycistronic LINE genome initiates from an internal RNA polymerase II promoter embedded within the LINE 5’ UTR and terminates at a poly-adenylation signal present in its 3’ UTR (Dewannieux and Heidmann, 2005). The function of the ORF2 protein is well understood and encodes endonuclease and reverse transcriptase functions (Han and Boeke, 2005). ORF1 encodes a protein that is also indispensable for L1 function, and exhibits intrinsic single stranded nucleic 5  acid binding properties, as well as sequence similarity to nucleic acid binding chaperone protein (Martin, 2006; Martin et al., 2008).  5’ UTR TSD  ORF1  ORF2 EN  ..aataaa..(a)n 3’ UTR  RT  TSD  Figure 1.3 The structure of a full-length integrated L1 LINE. A full length L1 is ~6 kb long and encodes two proteins, ORF1 and ORF2, required for its lifecycle. ORF2 encodes the endonuclease (EN) and reverse transcriptase (RT) functions, while ORF1 encodes an RNA binding protein with chaperone function. The bent arrow denotes the transcription start site of the L1 mRNA that initiates from a RNA polymerase II promoter embedded in its 5’ UTR. An integrated L1 is flanked by target site duplications (TSD).  Whereas SINEs are the most abundant elements in the human and mouse genomes, LINEs occupy the largest fraction (~20%) of these genomes (Table 1.1) (Lander et al., 2001; Waterston et al., 2002). Human and mouse LINEs are subdivided into the L1, L2, and L3 subfamilies, with the L1 subtype accounting for ~85% and ~95% of all LINE-derived sequences, respectively. The L2 and L3 elements, by contrast, have been extinct since the divergence of rodents and primates (Goodier and Kazazian, 2008), and do not occupy significant fractions of either genome (Lander et al., 2001; Waterston et al., 2002). In fact, an increased rate of substitution in the mouse genome has rendered most L2 elements unrecognizable (Lander et al., 2001). Due to the ongoing activity of LINEs in mobilizing other sequences, they are directly responsible for ~40% of the human and mouse genomes (Kazazian, 2004).  1.2.3 Endogenous retroviral elements/ LTR retrotransposons The endogenous retroviral elements (ERVs) have a common structure, and perhaps origin, with exogenous retroviruses (de Parseval and Heidmann, 2005) (Figure 1.4). An ERV provirus can range in size, but full-length elements are typically 5-9 kb in length, and are flanked by target site duplications (Smit, 1996). Internal to the target site duplications, direct LTRs 6  bound the open reading frames of the viral genome: gag (group-specific antigen)-pro (protease), pol (polymerase), and env (envelope). Expression of the viral genes is under the control of RNA pol II regulatory signals that are contained within the LTRs (Boeke and Stoye, 1997). The resulting polyprotein is processed by the pro protein into the following functional subunits: gag encodes the structural proteins that form the virus-like particle; pol encodes reverse transcriptase, RNase H, and integrase functions; and env encodes a transmembrane protein required during the extracellular stages of the viral lifecycle (Vogt, 1997). Endogenous retroviruses usually lack or encode mutant env genes (de Parseval and Heidmann, 2005), thereby restricting them to an intracellular existence.  RT U3 R U5 LTR TSD  gag  prt  aataaa  RNaseH INT pol  env  U3 R U5 LTR TSD  Figure 1.4 The structure of a typical integrated ERV/LTR retrotransposon. A full-length ERV/LTR Retrotransposon ranges in size from 7-10 kb, and encodes the proteins required for its life cycle. Most genomic ERVs/LTR retrotransposons either lack the envelope (env) genes or encode a mutated variant. The components of the virus-like particle are encoded by gag and the reverse transcriptase (RT), RNase H, and integrase (INT) functions are provided by the polymerase (pol) gene. The RNA polymerase II transcriptional regulatory signals are embedded within the long terminal repeats (LTRs), with the promoter (bent arrow) in the U3 region and the poly-adenylation signal marking the boundary of the R and U5 regions.  Human and mouse ERVs account for ~8.5% and ~10% of the entire genome, respectively, but ~85% exist as solitary LTRs (Lander et al., 2001; Waterston et al., 2002); a result of recombination between the terminal LTRs and deletion of intervening sequence. ERVs are categorized in three broad families, Class I, II, and III retroviruses (de Parseval and Heidmann, 2005; Waterston et al., 2002) (Table 1.1). The Class I, or gammaretroviruses, are distinguished based on their homology to Moloney Murine Leukemia Virus (MLV) and occupy <1% and ~3% of the mouse and human genomes, respectively. The Class II, or betaretroviruses, are defined by their similarity to the Mouse Mammary Tumour Virus (MMTV) and comprise 7  >3% and 0.3% of mouse and human genomes, respectively. This 10-fold difference is explained by the ongoing activity of the rodent-specific Class II families (Maksakova et al., 2006; Zhang et al., 2008). Lastly, the Class III elements are typified by the ERV-L subtype; these are the oldest mammalian ERVs (de Parseval and Heidmann, 2005). Grouped together with the Class III elements are the Mammalian Apparent LTR Retrotransposons (MaLRs). The MaLRs possess LTRs but their internal region, if present, lacks homology to any other ERV-encoded genes (Smit, 1996). However, class III elements occupy the largest ERV-derived fraction of both human and mouse genomes, at ~5.5% (Lander et al., 2001; Waterston et al., 2002).  1.3 Life cycle of LTR and non-LTR retrotransposons The LTR and non-LTR retrotransposons replicate by a copy and paste mechanism; therefore, their activity results in an increase of host genome size, as well as their own copy number (Goodier and Kazazian, 2008). In general, transcription of an integrated element from its embedded promoter produces a transcript that is subsequently reverse transcribed and inserted into a new locus within the host genome. Despite a common mechanism of replication, one key difference distinguishes the LTRs from non-LTR retrotransposons. Whereas the LTR elements are reverse transcribed in the cytoplasm, non-LTR elements are reverse transcribed in the nucleus (Kazazian, 2004).  1.3.1 LTR retrotransposon life cycle Following transcription of the provirus by the LTR-contained regulatory signals, as with any other mRNA, the viral transcript is poly-adenylated and capped prior to being transported to the cytoplasm for translation (Dewannieux and Heidmann, 2005). The resulting polyprotein is processed into its basic components, and they assemble into a virus-like particle (VLP) including 2 copies of the viral RNA and accessory proteins. The viral RNA is converted into DNA inside 8  the VLP by the co-packaged pol, and subsequently re-enters the nucleus and genomic integration is mediated by int. Some evidence indicates that the int gene of particular retroviruses have unique integration preferences. For example, HIV exhibits a preference for transcription units, while the yeast Ty retrotransposons and MLV tend to insert in or near promoters (Garfinkel, 2005; Mitchell et al., 2004; Wu et al., 2003). Moreover, a reconstituted HERV-K element preferentially inserted in or near genes (Brady et al., 2009). Alternatively, avian leukemia virus (ALV) integrations seem to have the most random distribution (Mitchell et al., 2004; Narezkina et al., 2004).  1.3.1.1 Current activity of human and mouse ERVs/LTR retrotransposons Essentially all human ERVs are defective due to mutation or deletion of their internal regions; in fact, as mentioned above, ~85% exist only as solitary LTRs (Lander et al., 2001). However, the primate-specific HERV-K family contains intact members and a few insertional polymorphisms have been reported (Belshaw et al., 2005; Costas, 2001; Turner et al., 2001). Some argue that active, exogenous HERV-K could still exist in the human population (Belshaw et al., 2005), but this has not been demonstrated. Interestingly, ~38,000 viral ORFs have been found in the human genome, although most are shortened and non-functional (Villesen et al., 2004). However, 42 long HERV ORFs have been identified in the human genome; with 17, 12, and 29 corresponding to the Gag, Pol, and Env proteins, respectively. These numbers increase to 27, 23, and 43 for ones that could be corrected by a single mutation, and most belong to the HERV-K family. In contrast to human, numerous ERV families are still actively retrotransposing in the genomes of inbred mouse strains (Maksakova et al., 2006). The Intracisternal A type particle and MusD (or the related non-autonomous Early Transposons (Etns)) are defined by ~700 and ~100 (or ~300 Etns) full length copies in the C57BL/6 genome, respectively (Dewannieux and Heidmann, 2005; Zhang et al., 2008). In addition, MLV are highly 9  polymorphic in various mouse strains, but exist in very low copy numbers (Bannert and Kurth, 2004; Mager and Medstrand, 2003; Waterston et al., 2002). Finally, a mouse ERV-L element has been cloned that encodes intact open reading frames corresponding to the gag and pol genes (Benit et al., 1997), indicating that this ancient family of ERV may still be active in the mouse genome (de Parseval and Heidmann, 2005). It has been calculated that ~10% of spontaneous mutations in the mouse germ line arise from ERV insertional mutagenesis (Maksakova et al., 2006).  1.3.2 Non-LTR retrotransposon life cycle A major difference between the autonomous LINEs and the non-autonomous SINEs is that the former are transcribed by pol II (Han and Boeke, 2005), while the latter by pol III (Tomilin, 2008). However, their life cycles are similar to each other since they both utilize the same proteins to propagate (Dewannieux et al., 2003). A primary transcript expressed from its associated promoter is subsequently transported into the cytoplasm, but only the LINE mRNA can produce proteins (Goodier and Kazazian, 2008). These proteins and the LINE mRNA form a ribonucleoprotein particle that is transported back into the nucleus (Kulpa and Moran, 2006). Once inside the nucleus, re-integration occurs in a process known as target-primed reverse transcription (TPRT) (Martin et al., 2005). Reverse transcription is completed at the DNA site that is nicked by the ORF2 endonuclease, often at a TTTT/AA motif (Cost and Boeke, 1998; Feng et al., 1996), with the exposed 3’ OH serving as a primer (Martin et al., 2005). A cispreference of the L1 proteins for its own mRNA has been observed (Kulpa and Moran, 2006), which calls into question how the SINEs have amplified with such success. One model suggests that due to their secondary structure and because they are tRNA- or SRP-related, SINEs interact closely with the ribosome and are thus favourably juxtaposed to interact with LINE ORF1 and ORF2 proteins as they are translated (Dewannieux and Heidmann, 2005). 10  1.3.2.1 Current activity of human and mouse non-LTR retrotransposons Analysis of genomic DNA indicates that L1s and Alus can be categorized into unique subtypes, based on sequence divergence (Batzer and Deininger, 2002; Kazazian and Moran, 1998). The master gene hypothesis states that at any given point in time during mammalian evolution, only a small proportion of all L1s and their associated SINEs were active (Deininger et al., 1992). This theory adequately explains the existence of the numerous L1 and Alu/B1 subtypes, and suggests that as one dies out a new one is poised to ‘take over’. Indeed, L1s are still retrotransposing in both human and mouse, and these elements belong to the youngest, or least diverged, Ta and TF, GF, A subgroups, respectively (Kazazian and Moran, 1998; Zemojtel et al., 2007). Approximately 50-100 active L1s exist within the human genome; however, there exist up to 3000 full-length copies in the mouse genome (Brouha et al., 2003; Kazazian, 2004; Zemojtel et al., 2007). Naturally, as long as functional L1 proteins are generated, then Alus are also expected to remain active retrotransposons. Similar to L1s, the least diverged Alu elements also constitute the active fraction (Batzer and Deininger, 2002), and the presence/absence of polymorphisms has been used in population genetics studies (Batzer et al., 1996; Perna et al., 1992). Although any Alu can be mobilized in principle, as long as it carries a functional promoter, in vivo transcription is largely dependent on adjacent regulatory motifs in the genomic DNA (Chu et al., 1995; Ullu and Weiner, 1985). Both L1 and Alu retrotranspositions have been implicated in numerous human and mouse diseases by causing insertional mutagenesis (Chen et al., 2005; Deininger and Batzer, 1999). It is estimated that Alu and L1 mobilization occurs once in every 20-50 human births, respectively (Cordaux et al., 2006; Kazazian, 2004).  1.4 Distribution of human and mouse genomic retrotransposons Availability of the human and mouse genome sequences has permitted analysis of TE distribution. The current localization of TEs has been shaped by millions of years of selection, 11  and is therefore not a good indicator of initial integration site preferences. Target site specification, however, can be assessed by studying polymorphic insertions or in cell culture assays using extant retrotransposons or endo- or exogenous retroviruses.  1.4.1 Distribution of LTRs in human and mouse genomes Several reports have observed that LTRs are over-represented in regions of the human and mouse genomes exhibiting a low GC% (Lander et al., 2001; Medstrand et al., 2002; Smit, 1999; van de Lagemaat et al., 2003; Waterston et al., 2002). Interestingly, compared to the distribution of young and old LTR elements in the human genome, Class II, which include the youngest ERVs, is skewed toward higher GC regions. In contrast, the old Class III families are biased for lower GC, while the intermediate Class I ERVs are found in regions exhibiting intermediate GC% (Medstrand et al., 2002). Taken together, these and other data indicate that ERVs are gradually being ‘cleared’ from gene-rich regions by selection, resulting in their accumulation within low GC% regions (Brady et al., 2009; Zhang et al., 2008). Furthermore, all classes of ERVs are under-represented within genes and their prevalence increases with distance from the transcription start site (Medstrand et al., 2002). Interestingly, intronic integrations are far less favourable in the sense orientation, with respect to gene polarity (Smit, 1999; van de Lagemaat et al., 2006). However, an analysis of LTR elements overlapping transcription start sites of cellular genes indicates these are preferred in the same orientation as the gene, indicating a role in the transcriptional control of genes via their embedded pol II signals (see 1.5.4) (Dunn et al., 2005).  1.4.2 Distribution of LINEs in human and mouse genomes The genomic distribution of LINEs, in general, is also proportional to low GC% in both human and mouse (Lander et al., 2001; Medstrand et al., 2002; Smit, 1999; Waterston et al., 12  2002). In the human genome, L1 elements are significantly over-represented in AT-rich regions (Akagi et al., 2008; Lander et al., 2001; Medstrand et al., 2002; Smit, 1999), while the low copy and older L2 elements exhibit a more even distribution in all genomic isochores (Medstrand et al., 2002). Indeed, polymorphisms induced by active L1s in different inbred mouse strains also exhibit a proclivity for AT-rich chromosomal regions (Akagi et al., 2008). The target specificity of L1s for TTTT/AA (Cost and Boeke, 1998; Feng et al., 1996), may explain the observation that they are over-represented in AT-rich isochores. Interestingly, the X chromosome is overrepresented for fixed L1s in comparison to autosomes; however, the autosomes are ~3 fold more likely targets for actively retrotransposing L1s, compared to the X chromosome (Akagi et al., 2008). Determination of LINE distribution across the mouse Y chromosome has not been possible because of incomplete sequence assembly, low coverage, and the presence of arrayed Huge Repeats (Alfoldi, 2008). Despite the under-representation of L1s within genes, it has been calculated that 75% of human genes bear at least one such integration, although usually within introns or UTRs (Han et al., 2004). In general, intronic L1s are biased to the antisense orientation regardless of GC% (Smit, 1999). However, in the 5 kb upstream of a gene’s TSS, L1s are markedly underrepresented in either orientation, and their prevalence increases with distance from the TSS (Medstrand et al., 2002). These observations, once again, can be explained by selection against their disruptive effect on the transcription, splicing, and/or poly-adenylation of genes (see 1.6.4). On the contrary, both inter- and intragenic L2s are more evenly distributed, and are only slightly less likely to be in the sense orientation (Medstrand et al., 2002), indicative of a benign effect on host genomes.  13  1.4.3 Distribution of SINEs in human and mouse genomes Since the SINEs have co-evolved with partner LINEs (Dewannieux et al., 2003; Ohshima and Okada, 2005), they are expected to insert with the same preference for AT-rich isochores. Indeed, young or recent Alu insertions are found in AT-rich isochores; however, the current distribution of Alu and MIR elements is skewed toward GC-rich and gene-rich regions (Grover et al., 2004; Lander et al., 2001; Medstrand et al., 2002; Smit, 1999). In fact, approximately 75% of human genes possess at least one Alu insertion (Grover et al., 2004). The over-representation of most SINEs in genes may be explained by the potentially negative effects arising from AluAlu recombination in gene-rich regions which remove exonic or regulatory sequences (Batzer and Deininger, 2002). Alternatively, recombination within the gene-poor AT-rich chromosomal regions is likely to be of little consequence, thus allowing for effective removal of extraneous sequences (Batzer and Deininger, 2002). However, it is equally likely that SINE insertions in GC-rich isochores are under selection, rather than being retained as neutral passengers, but a potential functional role is not well understood and is discussed later (1.5). Similar to the older Alu subfamilies, all mouse SINEs (B1, B2, ID, B4, and MIR) exhibit increased proclivity for GC-rich isochores (Jurka et al., 2005). In terms of genic Alus, only a slight sense orientation bias exists (Medstrand et al., 2002), despite a general over-representation within genes (Lander et al., 2001; Medstrand et al., 2002; Smit, 1999). Inter- and intragenic MIRs exhibit a more even distribution and no orientation bias (Medstrand et al., 2002).  1.5 Effects of transposable elements in host genomes Most human and mouse TEs are no longer mobile, and persist in host genomes as they mutate beyond recognition (Lander et al., 2001; Waterston et al., 2002). Therefore, most are not a threat as insertional mutagens. This fact prompted the belief that TEs are junk DNA, serving only their own purpose (Doolittle and Sapienza, 1980; Orgel and Crick, 1980). However, classic 14  early studies pointed to a potential role for TEs in regulating gene expression (Britten and Davidson, 1969; McClintock, 1953), most famously by Barbara McClintock more than sixty years ago in maize (McClintock, 1953). While this hypothesis and early findings failed to convince the mainstream, an abundance of data has emerged in recent years indicating their undeniable importance in genomic and genic evolution.  1.5.1 TE-mediated genomic recombination Given that TEs are incredibly abundant within human and mouse genomes, they serve as ubiquitous points of homology distributed throughout a genome (Dewannieux and Heidmann, 2005; Goodier and Kazazian, 2008). The result of a rearrangement is dependent on the orientation of the recombining TEs relative to one another, and whether the event is intra- or interchromosomal. Intrachromosomal rearrangements between direct repeats and inverted repeats lead to deletions or inversions, respectively, while interchromosomal events result in translocations, segmental duplications, and deletions. In some cases, disease may arise from aberrant germ-line or somatic recombination, generally due to intrachromosomal deletions or translocations (Chen et al., 2005; Deininger and Batzer, 1999; Ostertag and Kazazian, 2001). Segmental duplications (SDs), defined as large tracts of DNA (>10 kb) that exhibit a high degree of sequence similarity (>90%), arise by misalignment and recombination of chromosome pairs in the germ-line or early stages of development (Coghlan et al., 2005; Green and Chakravarti, 2001). Sequencing of the human and mouse genomes has revealed that SDs comprise >5% and <2%, respectively (Cheung et al., 2003; Lander et al., 2001; Waterston et al., 2002). Since SDs are biased to gene-rich regions (Bailey et al., 2002), duplicated genes are free to acquire regulatory or coding sequence novelties due to relaxed selection (Ohno, 1970, 1999). Expansion of gene families can occur through this mechanism, and some human examples are confirmed to have been mediated at least in part by 15  TEs, such as the glycophorin gene (Makalowski, 2000) and genes encoded in proximal 17p (Stankiewicz et al., 2004). In fact, a genome-wide analysis has indicated that Alus are observed at the flanks of >25% of human SDs, while the other repeat classes (LINEs, LTRs, and DNA transposons) combined associate with a further ~25% (Bailey et al., 2003). A similar analysis in mouse indicates that B1 and B2 elements are also significantly associated with breakpoint junctions (Jurka et al., 2005). Other genome-wide analyses indicate that ~44% of humanchimpanzee inversions (Lee et al., 2008), and most human-gibbon synteny breakpoints (Girirajan et al., 2009) associate with either L1 or Alu elements.  1.5.2 TE-mediated evolution of new genes The process of retroposition, or ‘trans-mobilization’, involves the reverse transcription of an mRNA and its subsequent re-integration in the genome, via the L1 machinery (Esnault et al., 2000). Retrocopies often exhibit accelerated rates of evolution, rely on the transcriptional signals of nearby host genes, and are predominately expressed in the testis, whereas their parent copies are broadly expressed (Marques et al., 2005; Vinckenbosch et al., 2006). A recent report has identified ~4,000 retrocopies in the human genome, and ~75 of these are primate-specific, indicating a rate of emergence of 1 potentially functional retrocopy per million years since the divergence from rodents (Marques et al., 2005). A clear bias exists for functional retrocopies to emerge off of the X-chromosome into autosomes (Emerson et al., 2004). In fact, ~1,000 human retrocopies are transcribed, of which ~600 encode intact ORFs (Vinckenbosch et al., 2006). In a remarkable example of L1-mediated retroposition, a processed cyclophilin gene (CYPA) was integrated into the intron of the TRIM5 gene, resulting in a TRIM5:CYPA chimera that confers resistance to HIV-1 infection in owl monkeys (Sayah et al., 2004). By a similar mechanism, exon 9 of the CFTR gene (Rozmahel et al., 1997) and exon 30 of the ATM gene (Ejima and Yang, 2003) have been dispersed throughout the human genome. 16  The evolution of new genes can also arise by 5’ or 3’ transduction of non-repetitive sequences (Chen et al., 2005). This event is principally mediated by L1s, and is achieved by bypassing a ‘weak’ native poly-adenylation signal in favour of a better one downstream (Han and Boeke, 2005; Moran et al., 1999). The resulting chimeric mRNA, upon genomic integration, can potentially scatter coding or regulatory regions throughout the genome, and has been shown to interfere with the transcription of a ‘disrupted’ gene (Akagi et al., 2008). Numerous genomewide analyses have discovered non-LINE sequences upstream of the 3’ TSD, along with a ‘borrowed’ poly-A signal (Goodier et al., 2000; Holmes et al., 1994; Moran et al., 1999; Myers et al., 2002; Pickeral et al., 2000). In fact, 23% and 15% of analyzed L1s in two different studies exhibit presence of 3’ transduced sequence in human (Goodier et al., 2000; Pickeral et al., 2000). Recently, SVA elements have also been shown to mediate 3’ transduction of non-repeat sequence (Xing et al., 2006). A remarkable example of SVA-mediated 3’ transduction demonstrates the emergence of a gene family specifically in human, chimpanzee, and gorilla (Xing et al., 2006). A potential advantage of SVA-mediated transduction is its ability to also donate a promoter (Xing et al., 2006), whereas LINE elements are less likely to do so due to their larger size and frequent 5’ truncation (Dewannieux and Heidmann, 2005).  1.5.3 Exonized TEs within proteins Facilitated by the presence of embedded splice signals, intronic SINEs, LINEs, and LTRs can be recruited into mRNAs (Belancio et al., 2006; Makalowski et al., 1994; Sorek et al., 2002; van de Lagemaat et al., 2006; Zemojtel et al., 2007) (Figure 1.5). However, due to the potentially disruptive influence of these splice signals on cellular genes, these elements are biased for the antisense orientation (Medstrand et al., 2002; van de Lagemaat et al., 2006). In fact, those intronic TEs that have been retained in the sense orientation are commonly incorporated into mRNAs (van de Lagemaat et al., 2006). Interestingly, L1s also contain splice signals in the 17  reverse orientation (Zemojtel et al., 2007), as do Alu elements (Makalowski et al., 1994; Sorek et al., 2002). Similar to L1s, Alus can provide functional splice signals regardless of their orientation, and due to their over-representation nearby or within transcription units (Lander et al., 2001; Medstrand et al., 2002), they are commonly recruited into mRNAs (Lin et al., 2008; Makalowski and Toda, 2007; Nekrutenko and Li, 2001; Sorek et al., 2002).  ATG  A(n)  P  Figure 1.5 Recruitment of TEs into coding regions of host genes. Diagram of a gene with a promoter (P) and downstream exons (black boxes); thinner boxes denote the 5’ and 3’ UTRs. Blue arrows indicate the polarity of an integrated TE (note TEs can also be sense oriented, but are overrepresented in the antisense (Medstrand et al., 2002)). Splicing of the native mRNA is shown as a black line, and the alternative transcripts in blue.  The emergence of high-throughput sequencing technologies has permitted analyses of entire transcriptomes (Carninci et al., 2006; Kwan et al., 2008; Wang et al., 2008). A recent finding has demonstrated that ~85% of genes undergo alternative splicing (Wang et al., 2008), and is consistent with an earlier finding (Johnson et al., 2003). Furthermore, >5% of alternatively spliced genes incorporate an Alu as an exon (Gotea and Makalowski, 2006; Sorek et al., 2002). Interestingly Alu splicing is not constitutive, thereby ensuring the continued production of ‘normal’ transcripts (Lin et al., 2008; Sorek et al., 2002). However, minor alternatively spliced isoforms permit ‘tinkering’ that could potentially result in adaptable changes to a gene product. Other analyses of expressed sequence databases have revealed that as many as 4% of human proteins incorporate TEs, with Alu and L1 fragments being the most common substrates of exonization (Gotea and Makalowski, 2006; Li et al., 2001; Nekrutenko and Li, 2001). In addition to bioinformatics discoveries, numerous experimentally characterized examples exist. In many cases, it has been shown that inclusion of TE-derived sequence in a protein leads to altered 18  function (Gotea and Makalowski, 2006; Nobukuni et al., 1997; Singer et al., 2004; Tang et al., 2000). In the most extreme examples, TE proteins themselves have been domesticated as cellular proteins by the host (Lander et al., 2001), as with the Drosophila ‘telomerase’ (Levis et al., 1993), the RAG1/RAG2 V(D)J recombinase (Zhou et al., 2004), and the use of DNA transposons as transcription factors (Feschotte, 2008). By far the most remarkable example arises from the domestication of unrelated ERV envelope proteins in a number of mammals and their potential role in placenta development (Rawn and Cross, 2008).  1.5.4 TE-mediated regulatory innovation of host genes The most important impact of TEs on host genes may be through their embedded regulatory signals. While most retrotransposons encode defective proteins, many have retained their enhancers, promoters, and poly-adenylation signals. These abundant, ready-to-use regulatory modules represent both a significant genomic hazard and a considerable opportunity for gene regulatory evolution.  1.5.4.1 Regulatory effects – Transcriptional termination The natural transcription termination signals harboured by LTRs and LINEs can potentially interfere with the transcription of a gene, if they are integrated in the sense orientation within introns (Medstrand et al., 2005) (Figure 1.6), potentially explaining the observed antisense bias (Medstrand et al., 2002; van de Lagemaat et al., 2003). However, a marked contrast in this orientation bias exists for TEs within 3’ UTRs (van de Lagemaat et al., 2003; van de Lagemaat et al., 2006), indicating a potentially important role in the regulation of poly-adenylation. Consistent with this observation, many examples are known whereby a domesticated TE provides a poly-adenylation signal to a gene; this is commonly mediated by LTRs (Brosius, 1999; Mager, 1989; Makalowski, 2000). In fact, as many as a few hundred examples surfaced 19  through sequencing of the human genome (Lander et al., 2001). Furthermore, plasmid-based assays indicate the presence of cryptic poly-A signals within L1s regardless of their orientation (Han et al., 2004; Perepelitsa-Belancio and Deininger, 2003). Conspicuously, many examples of SINEs donating poly-A signals to cellular genes have also been reported (Brosius, 1999; Makalowski, 2000; Murnane and Morales, 1995). Importantly, a transcription termination signal can arise by a single point mutation in the poly-A tails of retrotransposons (Makalowski, 2000).  ATG  A(n) aataaa  A(n) aataaa  A(n)  P  Figure 1.6 TE-mediated aberrant poly-adenylation of host genes. Diagram of a gene with a promoter (P) and downstream exons (black boxes); thinner boxes denote the 5’ and 3’ UTRs. Blue arrows indicate the polarity of an integrated TE (note TEs can also be sense oriented, but are overrepresented in the antisense (Medstrand et al., 2002)). Transcription start site is indicated by a bent arrow, and polyadenylation signals (aataaa) and poly-A tails (An) mark transcription termination.  1.5.4.2 Regulatory effects – Transcriptional initiation Alteration in gene expression patterns is believed to be an important force in shaping the differences seen between mammals, since they exhibit a similar complement of genes (King and Wilson, 1975). Indeed, the regulatory motifs within many TEs are retained; therefore, they represent abundant ready-to-use regulatory modules within host genomes (Figure 1.7). Rapidly evolving genes, or those involved in immunity or host defense mechanisms, are the most likely to acquire or allow TEs to be retained as promoters (van de Lagemaat et al., 2003). When the human and mouse genomes are compared, two typical scenarios of TE promoter domestication have been observed; a lineage-specific TE integration may only be used by one gene orthologue or an ancestral integration is domesticated as a promoter by both orthologues (van de Lagemaat et al., 2003). Intriguingly, it may be more parsimonious for a gene to domesticate a TE, in order  20  to obtain a more specialized expression profile, than it is to acquire the necessary point mutations to achieve the same result (Dunn et al., 2005).  ATG  A(n)  P  Figure 1.7 Recruitment of TEs as promoters of host genes. A hypothetical gene with a promoter (P), exons (black boxes), and integrated TEs (blue arrows). Transcription start sites are denoted by bent arrows. Note that TEs can integrate and operate as promoters or enhancers (of native promoter) in either orientation.  1.5.4.2.1 Domestication of LTRs as promoters of host genes The majority of LTRs are solitary elements and they have been fixed in the gene poor regions of the human and mouse genomes (Lander et al., 2001; Medstrand et al., 2002; van de Lagemaat et al., 2003; Waterston et al., 2002). Nonetheless, a number of examples in human and mouse have documented the domestication of LTR-derived promoters by a cellular gene (Brosius, 1999; Dunn et al., 2003; Landry et al., 2002; Medstrand et al., 2001; Peaston et al., 2004; van de Lagemaat et al., 2003). It was previously shown that LTR elements integrated in the immediate upstream sequence of a gene are significantly more likely to be in the sense orientation (Dunn et al., 2005), pointing to a role in transcriptional control. In fact, the frequency of genes adopting LTR promoters is ~0.7%, translating to approximately 200 expected examples in both the human and mouse genomes (van de Lagemaat et al., 2003). From the well-characterized examples of LTR promoter domestication events, it has become clear that they tend to be adopted as alternative promoters that serve to augment transcription of a gene in a particular tissue or developmental stage (Peaston et al., 2004; van de Lagemaat et al., 2003). A particularly noteworthy study revealed that ERV-L elements are the most highly expressed transcripts in the mouse oocyte and early embryo (Peaston et al., 2004). 21  Furthermore, Peaston and colleagues (Peaston et al., 2004) identified a plethora of LTR:host gene chimeras, some of which generated protein isoforms. Another set of particularly notable findings in mouse document the phenomenon of LTR-derived metastable epialleles (Rakyan et al., 2002). In these examples, variable methylation of domesticated LTR promoters among littermates results in altered expression levels of the downstream gene (Morgan et al., 1999; Rakyan et al., 2003; Whitelaw and Martin, 2001). Two genome wide analyses of wellcharacterized human genes both indicate that a significant proportion of promoter regions are occupied by LTR elements or other TEs (Conley et al., 2008; Jordan et al., 2003; van de Lagemaat et al., 2003). A subsequent analysis of the DSCR4 and DSCR8 genes, prompted by the findings of van de Lagemaat (van de Lagemaat et al., 2003), indicated that an ERV1 LTR element operates as a bidirectional promoter for these genes (Dunn et al., 2006). Interestingly, recent ChIP-Seq experiments have revealed that ~35% of in vivo p53 and ~40% of mouse Oct4/Sox2 binding sites reside within ERV1 and ERV-K elements, respectively (Bourque et al., 2008; Wang et al., 2007). Another report has discovered that Sp1 binding sites are important for the transcription of mouse Etn elements (Maksakova and Mager, 2005).  1.5.4.2.2 Domestication of LINEs as promoters of host genes Many genomic LINEs are 5’ truncated (Dewannieux and Heidmann, 2005); therefore, they do not carry their internal promoters. As stated previously, the L1s that are found in the upstream region of genes are biased for the antisense orientation (Medstrand et al., 2002; Smit, 1999). Nonetheless, L1s have been demonstrated to possess antisense promoter activity both in vitro and in vivo (Akagi et al., 2008; Lavie et al., 2004; Matlik et al., 2006; Nigumann et al., 2002; Speek, 2001). Similar to its native promoter, the antisense L1 promoter is also located within the 5’ UTR, therefore its potential effect is also limited by 5’ truncation. Indeed, LTRs are more commonly domesticated as gene promoters; however, numerous examples of LINE:cellular 22  gene fusion transcripts have been documented (Akagi et al., 2008; Matlik et al., 2006; Speek, 2001; van de Lagemaat et al., 2003; Zaiss and Kloetzel, 1999). A bioinformatics study revealed that LINEs may also function as composite promoters in conjunction with adjacent TEs (van de Lagemaat et al., 2003). Several reports indicate that YY1, SRY, and RUNX3 are important binding sites for L1 transcriptional initiation (Athanikar et al., 2004; Tchenio et al., 2000; Yang et al., 2003).  1.5.4.2.3 Domestication of SINEs as promoters of host genes Due to their proximity to genes (Lander et al., 2001; Medstrand et al., 2002; Waterston et al., 2002), SINEs are potentially the most important TE-derived regulatory elements, despite paradoxically being pol III responsive (Chu et al., 1995). Only recently have SINEs, particularly the Alu elements, emerged as potentially important pol II transcriptional regulatory modules (Shankar et al., 2004; Tomilin, 2008). A limited set of examples exist whereby genes have domesticated SINEs as direct promoters: the rodent NKG2D and Lama3 (Ferrigno et al., 2001; Lai et al., 2009) and the human p75TNFR (Singer et al., 2004). However, a comparatively large number of examples in human and rodents demonstrate that SINEs may contribute to regulatory regions by providing transcription factor binding sites (Brosius, 1999; Tomilin, 2008). Accumulating data indicates that Alus in particular harbour binding motifs for a number of pol II transcription factors and hormone response elements, such as AP-1, Sp1, p53, ERE, RARE, and TRE (Polak and Domany, 2006; Shankar et al., 2004; Zemojtel et al., 2009). An analysis of all gene-associated Alus on chromosome 22 revealed that genes involved in signaling and metabolism are over-represented for the TF binding sites listed above (Shankar et al., 2004). While many of these sites are present in the consensus sequences of different Alu family members, others are derived by mutation (Shankar et al., 2004; Zemojtel et al., 2009). One interesting example indicates that methylated CpGs within Alus, believed to account for ~33% of 23  all genomic CpGs (Schmid, 1991), can yield functional p53 binding sites when spontaneously deaminated (Zemojtel et al., 2009). Recent ChIP-Seq studies have revealed that ~15% and ~20% of human in vivo p53 and ER binding sites exist within SINEs (Bourque et al., 2008; Zemojtel et al., 2009). Alternatively, mouse B2 elements provide ~35% of all CTCF binding sites in vivo (Bourque et al., 2008).  1.6 A case study of TE-mediated regulatory innovation by a mammalian gene The neuronal apoptosis inhibitory protein (NAIP) gene has orthologous copies in rodents and primates, indicating that an ancestral gene existed prior to their speciation >70 million years ago (Roy et al., 1995; Yaraghi et al., 1998). Since the divergence of these mammalian lineages, however, their NAIP genes have followed unique evolutionary trajectories, but have remarkably converged on the same outcome (discussed in 1.6.2). An earlier bioinformatics analysis searching for human and mouse transcripts with 5’ termini that overlapped LTR elements discovered NAIP as an interesting case study to explore the acquisition of regulatory innovations (L. van de Lagemaat, unpublished observation).  1.6.1 NAIP domain structure The cloning of NAIP as a candidate gene for the neurodegenerative disorder spinal muscular atrophy (SMA) approximately 15 years ago (Roy et al., 1995) initiated a chain reaction of research that continues to gain momentum today. Initial sequencing and comparison to information within the fledgling BLAST database indicated amino acid similarity with two baculoviral proteins, Cydia pomonella inhibitor of apoptosis protein (CpIAP) and Orgyia pseudotsugata IAP (OpIAP). These genes are capable of inhibiting virally-induced apoptosis in insect cells (Clem and Miller, 1994). Subsequent investigation revealed numerous other IAPs in most biological kingdoms (LaCasse et al., 1998; Liston et al., 1996), suggesting that regulation 24  of programmed cell death is an evolutionarily conserved process. The unifying feature of all IAPs is the presence of the baculoviral IAP repeat (BIR) domain at the N-terminus, in up to three copies (Vaux and Silke, 2005). Upon sequencing of the human and mouse genomes the full diversity of the IAP family became clear, with eight members encoded in each organism, respectively (Figure 1.8) (Lander et al., 2001; Waterston et al., 2002). Interestingly, one of the IAP genes appears to have arisen by retroposition in both human and mouse. The C-terminal domains of IAPs, however, are more flexible, presumably a result of the need to diversify protein interaction domains (O'Riordan et al., 2008).  Figure 1.8. NAIP protein domains and familial designations. NAIP is part of two protein families, the Inhibitor of Apoptosis Proteins (IAP) and the Nod-like Receptors (NLR). The NAIP gene is shown at the top of both columns and other human and mouse family members beneath. Domains of the corresponding proteins (large black rectangle) are as indicated in each coloured rectangle. BIR: baculoviral IAP repeat domain (gold); NBD: nucleotide binding domain (green); LRR: leucine rich repeat (red); RING: really interesting new gene (light blue); CARD: caspase activation and recruitment domain (grey); UBC: ubiquitinconjugating enzyme (tan); and PYD: pyrin (deep blue).  Along with identifying the N-terminal NAIP BIRs, Roy et al. (Roy et al., 1995) correctly identified a central GTP/ATP binding site as well as a C-terminal prokaryotic lipid attachment site (Roy et al., 1995). Only a number of years later did the importance of these domains, now 25  classified as nucleotide binding domain (NBD) and leucine rich repeats (LRRs), become apparent. As a result of the publication of the draft human and mouse genomes it emerged that a new superfamily existed, unified by C-terminal NBD and LRR domains (Harton et al., 2002) (Figure 1.8). Initially ascribed a variety of names, a standardized nomenclature arrived at the NOD (for nucleotide oligomerization domain)-like receptor (NLR) classification (Ting et al., 2008). The NLRs, so-named due to their similarity, both structurally and functionally, to the tolllike receptors (TLRs) of the mammalian innate immune system, comprise a family of more than 20 individual members in both human and mouse (Wilmanski et al., 2008).  1.6.2 NAIP chromosomal arrangement Structural variation of the syntenic regions encoding NAIP and surrounding loci in the human and mouse genomes is complex (Fortier et al., 2005; Fortna et al., 2004; Schmutz et al., 2004) (Figure 1.9). Since speciation of the primate and rodent lineages, this region has followed unique paths in each species, yet they have converged on the same principle of copy number expansion (Growney et al., 2000). In fact, this region appears to be currently undergoing rapid evolution, as copy number variations are widely reported between individuals (Chen et al., 1998; Schmutz et al., 2004; Tran et al., 2008) and inbred mouse strains (Growney and Dietrich, 2000). Initial sequencing of the human genome assembled a region of 5q13.2 that encoded one fulllength copy and a pseudogene lacking the first two coding exons (Lefebvre et al., 1995; Roy et al., 1995). These were oriented within a large (500 kb) inverted duplication that also included numerous other genes: SMN (the SMA disease gene); GTF2H2; SERF1; and GUSBP1 (Figure 1.9). Resequencing of the SMA region in two different individuals revealed the existence of 3 and 5 NAIP copies, respectively (Schmutz et al., 2004); both regions, SMAvar1 and SMAvar2, encode one full length NAIP copy (NAIPfull) and 2 or 4 variably deleted pseudogenes. Sequencing of the Legionella critical interval (Lgn) in different mouse strains revealed a 26  different organization of the Naip genes (Growney and Dietrich, 2000). Variable copy numbers of the gene exist in all strains studied, but they are arranged in tandem head to tail (Figure 1.9). Whereas the 129 mouse strain encodes 7 copies and 3 pseudogenes, the C57BL/6 strain encodes 4 full-length copies and one pseudogene (Growney and Dietrich, 2000).  Figure 1.9. Genomic organization of the human and mouse NAIP copies. Diagram of chromosomal regions encoding NAIP and surrounding genes, polarity is indicated by the direction of block arrows. Chromosomal band is indicated at right of each diagram.  Both human and mouse NAIPfull encode 14 coding exons with the termination codon embedded in the same exon as a long 3’ UTR. However, the human and mouse 5’ UTRs are not homologous, which supports a lineage-specific acquisition of regulatory sequences (Figure 1.10). Interestingly, even the B6 mNaip copies exhibit dissimilar 5’ UTRs. The nucleotide sequence of the ORFs, however, exhibit ~90% identity; one extreme example is provided by the mNaip5 and mNaip6 copies which share >99% sequence identity, indicating a very recent duplication. By contrast, the human NAIP copies are virtually identical to one another across the UTRs (if present) and ORF. A comparison of the human and mouse copies reveals that the ORFs possess ~75% identity.  27  Figure 1.10. Dot plot comparing human and mouse NAIP upstream regions. Alignment of 20 kb centered on the putative human (y-axis) and mouse (x-axis) NAIP LTR promoters (blue and red rectangles, respectively). Regions of sequence homology are determined by a 25 bp sliding window analysis and are denoted by a dot. To differentiate true homology from noise the alignments were rooted on the first protein coding exon, where ATG denotes the translation initiation codon.  1.6.3 NAIP function The anti-apoptosis researchers had nearly a 10 year head start on their innate immunity colleagues, and as such a plethora of early reports investigated the role of IAPs in sequestering caspases. An early analysis, similar to one using CpIAP in insect cells (Clem and Miller, 1994), revealed that NAIP could indeed inhibit apoptosis in mammalian cells in response to various stimuli (Liston et al., 1996). This suppressive effect was specifically due to the inactivation of caspase 3 and 7 (Maier et al., 2002) and caspase 9 (Davoodi et al., 2004). The assumption that NAIP was specifically active in neuronal cells, consistent with it being a SMA candidate gene (Roy et al., 1995), was verified in early experiments using rodent models of seizure (Holcik et al., 2000) and stroke (Xu et al., 1997a). NAIP was also implicated in enhancing the survival of granulosa cells, important regulators of oocyte development (Matsumoto et al., 1999), the first evidence that NAIP was not restricted to neuronal cells. A natural progression in the study of 28  IAPs is their potential upregulation in cancers (LaCasse et al., 1998), and is discussed later (1.6.4). The role of the NLR-defining domains was not apparent at first and received little attention, because of a fascination with the anti-apoptotic properties of NAIP, and the other IAPs. Although, the NBD and LRR domains were correctly predicted upon cloning (Roy et al., 1995), subsequent studies did not address their role. A syntenic chromosomal region in the 129 mouse strain was shown to harbour multiple Naip copies that were linked with susceptibility to a pneumonia-inducing bacteria, Legionella pneumophila (Scharf et al., 1996; Yaraghi et al., 1998). Subsequently, it was shown that one particular Naip copy conferred resistance to L. pneumophila replication in host macrophages (Diez et al., 2003), but these results were not interpreted in the context of the innate immune response. A plethora of recent publications, have cemented the additional status of both human and mouse NAIP as a potent cytosolic pathogen recognition sensor (Molofsky et al., 2006; Ren et al., 2006; Zamboni et al., 2006). The mechanism of NAIP action in the innate immune system involves LRR-dependent sensing of bacterial flagellin, followed by formation of the inflammasome (a complex of proteins that activate caspase 1). Assembly of the inflammasome leads to the activation of caspase 1, which in turn processes the proinflammatory cytokines pro-IL-1# and proIL-18 into their active forms (Fritz et al., 2006; Miao et al., 2006). A rapid cell death is executed (Molofsky et al., 2006; Ren et al., 2006), in striking contrast to the documented functions of NAIP in apoptosis inhibition.  1.6.4 NAIP expression The expression of NAIP at both the RNA and protein levels has been previously studied in human and various mouse strains, and to a lesser extent in rat. Analyses were initially restricted to brains and motor neurons, due to the connection of NAIP with SMA, in the indicated animals. Several reports show expression in human brain samples by RT-PCR, In situ 29  hybridization, and Western Blot (Maier et al., 2007; Notarbartolo et al., 2002; Roy et al., 1995; Yamamoto et al., 1999), and in patient samples with various forms of neurodegenerative disease (Christie et al., 2007; Hebb et al., 2008; Seidl et al., 1999). Similar investigations of rodent CNS have been performed, and reveal brain-specific expression throughout all stages of embryonic and post-natal development (Ingram-Crooks et al., 2002; Matsumoto et al., 1999; Shin et al., 2003; Xu et al., 1997b; Yaraghi et al., 1998). In fact, broad expression of Naip is observed in all stages of mouse development (Ingram-Crooks et al., 2002), indicating that expression of this gene is not restricted to neuronal lineages. Consistent with this observation, human and mouse NAIP is now understood to be widely expressed in both human and mouse in many stages of development and organs/tissues (Ka and Hunt, 2003; Maier et al., 2007; Matsumoto et al., 1999; Roy et al., 1995; Yamamoto et al., 1999; Yaraghi et al., 1998). Notably its expression is consistently observed, and is often highest, in hematopoietic-related organs and cells, such as lung, spleen, thymus, peripheral blood leukocytes, macrophages, and dendritic cells (Huang et al., 1999; Nakagawa et al., 2005; Vinzing et al., 2008; Wright et al., 2003; Xu et al., 2002; Yamamoto et al., 2004; Yamamoto et al., 1999). Accordingly, upregulation of NAIP in various leukemias has also been reported (Nakagawa et al., 2005; Yamamoto et al., 2004), as well as in drug resistant HL-60 cell lines (Notarbartolo et al., 2002). Furthermore, other reports indicate increased expression in breast cancer patients (Choi et al., 2007), in prostate cancer cell lines (McEleny et al., 2002; Nomura et al., 2005) and xenograft models (Chu et al., unpublished results).  1.7 Thesis objectives Since most human TEs are no longer capable of insertional mutagenesis, the function of these ubiquitous relics is the subject of debate. Increased attention has focused on their potential effects in facilitating evolution, such as by mediating genomic rearrangements, directing 30  transcriptional innovation, and establishing transcriptional networks. The role of TEs in promoting functional diversification of the human transcriptome and proteome is now of particular interest, because recent advances in high-throughput technologies permit such analyses. Indeed, the RNA polymerase II regulatory motifs embedded within TEs represent not only a great hazard to cellular genes, but also a significant opportunity. In Chapter 2 of this thesis, I address the hypothesis that the human and mouse NAIP genes have in fact domesticated unrelated LTRs as promoters. The goal of this work was to validate the findings of a preliminary bioinformatics analysis, and I demonstrate that in human, the more typical scenario of tissue-specific LTR promoter use is apparent. However, in mouse and rat, my data indicates that an LTR is the principal gene promoter. A version of this chapter has been published: Romanish, M.T., Lock, W.M., van de Lagemaat, L.N., Dunn, C.A., and Mager, D.L. (2007). Repeated recruitment of LTR retrotransposons as promoters by the antiapoptotic locus NAIP during mammalian evolution. PLoS Genetics 3, e10. In Chapter 3 of this thesis, I address the hypothesis that the multiple copies of human NAIP, revealed through improved assembly of chromosome 5q13.2, are expressed. I elaborated on the theme of TE-mediated regulatory innovation of the human NAIP genes by discovering that, among several novel promoters, two Alu SINEs are the unlikely source of transcriptional regulatory signals. Remarkably, transcription of NAIP isoforms is demonstrated to result in novel NAIP isoforms. A version of this chapter has been published: Romanish, M.T., Nakamura, H.N., Lai, C.B., Wang, Y.Z., Mager, D.L. (2009). A novel protein isoform of the multicopy NAIP genes derives from intragenic Alu SINE promoters. PLoS One, 4, e5761. Finally, in Chapter 4 of my thesis, I discuss these results in the context of transposable element-mediated effects on transcriptome and proteome evolution. I also address how advances in high-throughput sequencing and bioinformatics approaches can be used to better understand some outstanding questions concerning TE domestication. 31  1.8 References Akagi, K., Li, J., Stephens, R.M., Volfovsky, N., and Symer, D.E. (2008). Extensive variation between inbred mouse strains due to endogenous L1 retrotransposition. Genome Res 18, 869880. Alfoldi, J.E. (2008). Sequence of mouse Y chromosome. In Dept of Biology (Boston, MIT). Athanikar, J.N., Badge, R.M., and Moran, J.V. (2004). A YY1-binding site is required for accurate human LINE-1 transcription initiation. Nucleic Acids Res 32, 3846-3855. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. (2002). Recent segmental duplications in the human genome. Science 297, 1003-1007. Bailey, J.A., Liu, G., and Eichler, E.E. (2003). An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73, 823-834. Bannert, N., and Kurth, R. (2004). Retroelements and the human genome: new perspectives on an old relation. Proc Natl Acad Sci U S A 101 Suppl 2, 14572-14579. Batzer, M.A., Arcot, S.S., Phinney, J.W., Alegria-Hartman, M., Kass, D.H., Milligan, S.M., Kimpton, C., Gill, P., Hochmeister, M., Ioannou, P.A., et al. (1996). Genetic variation of recent Alu insertions in human populations. J Mol Evol 42, 22-29. Batzer, M.A., and Deininger, P.L. (2002). Alu repeats and human genomic diversity. Nat Rev Genet 3, 370-379. Belancio, V.P., Hedges, D.J., and Deininger, P. (2006). LINE-1 RNA splicing and influences on mammalian gene expression. Nucleic Acids Res 34, 1512-1521. Belshaw, R., Dawson, A.L., Woolven-Allen, J., Redding, J., Burt, A., and Tristem, M. (2005). Genomewide screening reveals high levels of insertional polymorphism in the human endogenous retrovirus family HERV-K(HML2): implications for present-day activity. J Virol 79, 12507-12514. Benit, L., De Parseval, N., Casella, J.F., Callebaut, I., Cordonnier, A., and Heidmann, T. (1997). Cloning of a new murine endogenous retrovirus, MuERV-L, with strong similarity to the human HERV-L element and with a gag coding sequence closely related to the Fv1 restriction gene. J Virol 71, 5652-5657. Boeke, J.D., and Stoye, J.P. (1997). Retrotransposons, endogenous retroviruses, and the evolution of retroelements. In Retroviruses, J.M. Coffin, S.H. Hughes, and H.E. Varmus, eds. (Cold Spring Harbour, Cold Spring Harbour Laboratory Press), pp. 343-436. Bourque, G., Leong, B., Vega, V.B., Chen, X., Lee, Y.L., Srinivasan, K.G., Chew, J.L., Ruan, Y., Wei, C.L., Ng, H.H., et al. (2008). Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 18, 1752-1762.  32  Brady, T., Lee, Y.N., Ronen, K., Malani, N., Berry, C.C., Bieniasz, P.D., and Bushman, F.D. (2009). Integration target site selection by a resurrected human endogenous retrovirus. Genes Dev 23, 633-642. Britten, R.J., and Davidson, E.H. (1969). Gene regulation for higher cells: a theory. Science 165, 349-357. Brosius, J. (1999). RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238, 115-134. Brouha, B., Schustak, J., Badge, R.M., Lutz-Prigge, S., Farley, A.H., Moran, J.V., and Kazazian, H.H., Jr. (2003). Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A 100, 5280-5285. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. (2006). Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38, 626-635. Chen, J.M., Stenson, P.D., Cooper, D.N., and Ferec, C. (2005). A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet 117, 411-427. Chen, Q., Baird, S.D., Mahadevan, M., Besner-Johnston, A., Farahani, R., Xuan, J., Kang, X., Lefebvre, C., Ikeda, J.E., Korneluk, R.G., et al. (1998). Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121-127. Cheung, J., Wilson, M.D., Zhang, J., Khaja, R., MacDonald, J.R., Heng, H.H., Koop, B.F., and Scherer, S.W. (2003). Recent segmental and gene duplications in the mouse genome. Genome Biol 4, R47. Choi, J., Hwang, Y.K., Choi, Y.J., Yoo, K.E., Kim, J.H., Nam, S.J., Yang, J.H., Lee, S.J., Yoo, K.H., Sung, K.W., et al. (2007). Neuronal apoptosis inhibitory protein is overexpressed in patients with unfavorable prognostic factors in breast cancer. J Korean Med Sci 22 Suppl, S1723. Christie, L.A., Su, J.H., Tu, C.H., Dick, M.C., Zhou, J., and Cotman, C.W. (2007). Differential regulation of inhibitors of apoptosis proteins in Alzheimer's disease brains. Neurobiol Dis 26, 165-173. Chu, W.M., Liu, W.M., and Schmid, C.W. (1995). RNA polymerase III promoter and terminator elements affect Alu RNA expression. Nucleic Acids Res 23, 1750-1757. Clem, R.J., and Miller, L.K. (1994). Control of programmed cell death by the baculovirus genes p35 and iap. Mol Cell Biol 14, 5212-5222. Coghlan, A., Eichler, E.E., Oliver, S.G., Paterson, A.H., and Stein, L. (2005). Chromosome evolution in eukaryotes: a multi-kingdom perspective. Trends Genet 21, 673-682. Conley, A.B., Piriyapongsa, J., and Jordan, I.K. (2008). Retroviral promoters in the human genome. Bioinformatics 24, 1563-1567. 33  Cordaux, R., Hedges, D.J., Herke, S.W., and Batzer, M.A. (2006). Estimating the retrotransposition rate of human Alu elements. Gene 373, 134-137. Cost, G.J., and Boeke, J.D. (1998). Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry 37, 18081-18093. Costas, J. (2001). Evolutionary dynamics of the human endogenous retrovirus family HERV-K inferred from full-length proviral genomes. J Mol Evol 53, 237-243. Davoodi, J., Lin, L., Kelly, J., Liston, P., and MacKenzie, A.E. (2004). Neuronal apoptosisinhibitory protein does not interact with Smac and requires ATP to bind caspase-9. J Biol Chem 279, 40622-40628. de Parseval, N., and Heidmann, T. (2005). Human endogenous retroviruses: from infectious elements to human genes. Cytogenet Genome Res 110, 318-332. Deininger, P.L., and Batzer, M.A. (1999). Alu repeats and human disease. Mol Genet Metab 67, 183-193. Deininger, P.L., Batzer, M.A., Hutchison, C.A., 3rd, and Edgell, M.H. (1992). Master genes in mammalian repetitive DNA amplification. Trends Genet 8, 307-311. Dewannieux, M., Esnault, C., and Heidmann, T. (2003). LINE-mediated retrotransposition of marked Alu sequences. Nat Genet 35, 41-48. Dewannieux, M., and Heidmann, T. (2005). LINEs, SINEs and processed pseudogenes: parasitic strategies for genome modeling. Cytogenet Genome Res 110, 35-48. Diez, E., Lee, S.H., Gauthier, S., Yaraghi, Z., Tremblay, M., Vidal, S., and Gros, P. (2003). Birc1e is the gene within the Lgn1 locus associated with resistance to Legionella pneumophila. Nat Genet 33, 55-60. Doolittle, W.F., and Sapienza, C. (1980). Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601-603. Dunn, C.A., Medstrand, P., and Mager, D.L. (2003). An endogenous retroviral long terminal repeat is the dominant promoter for human beta1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci U S A 100, 12841-12846. Dunn, C.A., Romanish, M.T., Gutierrez, L.E., van de Lagemaat, L.N., and Mager, D.L. (2006). Transcription of two human genes from a bidirectional endogenous retrovirus promoter. Gene 366, 335-342. Dunn, C.A., van de Lagemaat, L.N., Baillie, G.J., and Mager, D.L. (2005). Endogenous retrovirus long terminal repeats as ready-to-use mobile promoters: the case of primate beta3GAL-T5. Gene 364, 2-12. Ejima, Y., and Yang, L. (2003). Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling. Hum Mol Genet 12, 1321-1328. 34  Emerson, J.J., Kaessmann, H., Betran, E., and Long, M. (2004). Extensive gene traffic on the mammalian X chromosome. Science 303, 537-540. Esnault, C., Maestre, J., and Heidmann, T. (2000). Human LINE retrotransposons generate processed pseudogenes. Nat Genet 24, 363-367. Feng, Q., Moran, J.V., Kazazian, H.H., Jr., and Boeke, J.D. (1996). Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905-916. Ferrigno, O., Virolle, T., Djabari, Z., Ortonne, J.P., White, R.J., and Aberdam, D. (2001). Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat Genet 28, 77-81. Feschotte, C. (2008). Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9, 397-405. Fortier, A., Diez, E., and Gros, P. (2005). Naip5/Birc1e and susceptibility to Legionella pneumophila. Trends Microbiol 13, 328-335. Fortna, A., Kim, Y., MacLaren, E., Marshall, K., Hahn, G., Meltesen, L., Brenton, M., Hink, R., Burgers, S., Hernandez-Boussard, T., et al. (2004). Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol 2, E207. Fritz, J.H., Ferrero, R.L., Philpott, D.J., and Girardin, S.E. (2006). Nod-like proteins in immunity, inflammation and disease. Nat Immunol 7, 1250-1257. Garfinkel, D.J. (2005). Genome evolution mediated by Ty elements in Saccharomyces. Cytogenet Genome Res 110, 63-69. Girirajan, S., Chen, L., Graves, T., Marques-Bonet, T., Ventura, M., Fronick, C., Fulton, L., Rocchi, M., Fulton, R.S., Wilson, R.K., et al. (2009). Sequencing human-gibbon breakpoints of synteny reveals mosaic new insertions at rearrangement sites. Genome Res 19, 178-190. Goodier, J.L., and Kazazian, H.H., Jr. (2008). Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell 135, 23-35. Goodier, J.L., Ostertag, E.M., and Kazazian, H.H., Jr. (2000). Transduction of 3'-flanking sequences is common in L1 retrotransposition. Hum Mol Genet 9, 653-657. Gotea, V., and Makalowski, W. (2006). Do transposable elements really contribute to proteomes? Trends Genet 22, 260-267. Green, E.D., and Chakravarti, A. (2001). The human genome sequence expedition: views from the "base camp". Genome Res 11, 645-651. Gregory, T.R. (2001). Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biol Rev Camb Philos Soc 76, 65-101. Grover, D., Mukerji, M., Bhatnagar, P., Kannan, K., and Brahmachari, S.K. (2004). Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics 20, 813-817. 35  Growney, J.D., and Dietrich, W.F. (2000). High-resolution genetic and physical map of the Lgn1 interval in C57BL/6J implicates Naip2 or Naip5 in Legionella pneumophila pathogenesis. Genome Res 10, 1158-1171. Growney, J.D., Scharf, J.M., Kunkel, L.M., and Dietrich, W.F. (2000). Evolutionary divergence of the mouse and human Lgn1/SMA repeat structures. Genomics 64, 62-81. Han, J.S., and Boeke, J.D. (2005). LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression? Bioessays 27, 775-784. Han, J.S., Szak, S.T., and Boeke, J.D. (2004). Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429, 268-274. Harton, J.A., Linhoff, M.W., Zhang, J., and Ting, J.P. (2002). Cutting edge: CATERPILLER: a large family of mammalian genes containing CARD, pyrin, nucleotide-binding, and leucine-rich repeat domains. J Immunol 169, 4088-4093. Hebb, A.L., Moore, C.S., Bhan, V., Campbell, T., Fisk, J.D., Robertson, H.A., Thorne, M., Lacasse, E., Holcik, M., Gillard, J., et al. (2008). Expression of the inhibitor of apoptosis protein family in multiple sclerosis reveals a potential immunomodulatory role during autoimmune mediated demyelination. Mult Scler 14, 577-594. Holcik, M., Thompson, C.S., Yaraghi, Z., Lefebvre, C.A., MacKenzie, A.E., and Korneluk, R.G. (2000). The hippocampal neurons of neuronal apoptosis inhibitory protein 1 (NAIP1)-deleted mice display increased vulnerability to kainic acid-induced injury. Proc Natl Acad Sci U S A 97, 2286-2290. Holmes, S.E., Dombroski, B.A., Krebs, C.M., Boehm, C.D., and Kazazian, H.H., Jr. (1994). A new retrotransposable human L1 element from the LRE2 locus on chromosome 1q produces a chimaeric insertion. Nat Genet 7, 143-148. Hua-Van, A., Le Rouzic, A., Maisonhaute, C., and Capy, P. (2005). Abundance, distribution and dynamics of retrotransposable elements and transposons: similarities and differences. Cytogenet Genome Res 110, 426-440. Huang, S., Scharf, J.M., Growney, J.D., Endrizzi, M.G., and Dietrich, W.F. (1999). The mouse Naip gene cluster on Chromosome 13 encodes several distinct functional transcripts. Mamm Genome 10, 1032-1035. Ingram-Crooks, J., Holcik, M., Drmanic, S., and MacKenzie, A.E. (2002). Distinct expression of neuronal apoptosis inhibitory protein (NAIP) during murine development. Neuroreport 13, 397402. Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., and Shoemaker, D.D. (2003). Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302, 2141-2144. Jordan, I.K., Rogozin, I.B., Glazko, G.V., and Koonin, E.V. (2003). Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 19, 68-72.  36  Jurka, J., Kohany, O., Pavlicek, A., Kapitonov, V.V., and Jurka, M.V. (2005). Clustering, duplication and chromosomal distribution of mouse SINE retrotransposons. Cytogenet Genome Res 110, 117-123. Ka, H., and Hunt, J.S. (2003). Temporal and spatial patterns of expression of inhibitors of apoptosis in human placentas. Am J Pathol 163, 413-422. Kazazian, H.H., Jr. (2004). Mobile elements: drivers of genome evolution. Science 303, 16261632. Kazazian, H.H., Jr., and Moran, J.V. (1998). The impact of L1 retrotransposons on the human genome. Nat Genet 19, 19-24. Kidwell, M.G. (2002). Transposable elements and the evolution of genome size in eukaryotes. Genetica 115, 49-63. Kidwell, M.G., and Lisch, D.R. (2000). Transposable elements and host genome evolution. Trends Ecol Evol 15, 95-99. King, M.C., and Wilson, A.C. (1975). Evolution at two levels in humans and chimpanzees. Science 188, 107-116. Kulpa, D.A., and Moran, J.V. (2006). Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat Struct Mol Biol 13, 655-660. Kwan, T., Benovoy, D., Dias, C., Gurd, S., Provencher, C., Beaulieu, P., Hudson, T.J., Sladek, R., and Majewski, J. (2008). Genome-wide analysis of transcript isoform variation in humans. Nat Genet 40, 225-231. LaCasse, E.C., Baird, S., Korneluk, R.G., and MacKenzie, A.E. (1998). The inhibitors of apoptosis (IAPs) and their emerging role in cancer. Oncogene 17, 3247-3259. Lai, C.B., Zhang, Y., Rogers, S.L., and Mager, D.L. (2009). Creation of the two isoforms of rodent NKG2D was driven by a B1 retrotransposon insertion. Nucleic Acids Res 37, 3032-3043. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. Landry, J.R., Rouhi, A., Medstrand, P., and Mager, D.L. (2002). The Opitz syndrome gene Mid1 is transcribed from a human endogenous retroviral promoter. Mol Biol Evol 19, 1934-1942. Lavie, L., Maldener, E., Brouha, B., Meese, E.U., and Mayer, J. (2004). The human L1 promoter: variable transcription initiation sites and a major impact of upstream flanking sequence on promoter activity. Genome Res 14, 2253-2260. Lee, J., Han, K., Meyer, T.J., Kim, H.S., and Batzer, M.A. (2008). Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS ONE 3, e4047.  37  Lefebvre, S., Burglen, L., Reboullet, S., Clermont, O., Burlet, P., Viollet, L., Benichou, B., Cruaud, C., Millasseau, P., Zeviani, M., et al. (1995). Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155-165. Levis, R.W., Ganesan, R., Houtchens, K., Tolar, L.A., and Sheen, F.M. (1993). Transposons in place of telomeric repeats at a Drosophila telomere. Cell 75, 1083-1093. Li, W.H., Gu, Z., Wang, H., and Nekrutenko, A. (2001). Evolutionary analyses of the human genome. Nature 409, 847-849. Lin, L., Shen, S., Tye, A., Cai, J.J., Jiang, P., Davidson, B.L., and Xing, Y. (2008). Diverse splicing patterns of exonized Alu elements in human tissues. PLoS Genet 4, e1000225. Liston, P., Roy, N., Tamai, K., Lefebvre, C., Baird, S., Cherton-Horvat, G., Farahani, R., McLean, M., Ikeda, J.E., MacKenzie, A., et al. (1996). Suppression of apoptosis in mammalian cells by NAIP and a related family of IAP genes. Nature 379, 349-353. Mager, D.L. (1989). Polyadenylation function and sequence variability of the long terminal repeats of the human endogenous retrovirus-like family RTVL-H. Virology 173, 591-599. Mager, D.L., and Medstrand, P. (2003). Retroviral repeat sequences. In Nature Encyclopedia of the human genome, D. Cooper, ed. (Hampshire (United Kingdom), Macmillan Publishers), pp. 57-63. Maier, J.K., Balabanian, S., Coffill, C.R., Stewart, A., Pelletier, L., Franks, D.J., Gendron, N.H., and MacKenzie, A.E. (2007). Distribution of neuronal apoptosis inhibitory protein in human tissues. J Histochem Cytochem 55, 911-923. Maier, J.K., Lahoua, Z., Gendron, N.H., Fetni, R., Johnston, A., Davoodi, J., Rasper, D., Roy, S., Slack, R.S., Nicholson, D.W., et al. (2002). The neuronal apoptosis inhibitory protein is a direct inhibitor of caspases 3 and 7. J Neurosci 22, 2035-2043. Makalowski, W. (2000). Genomic scrap yard: how genomes utilize all that junk. Gene 259, 6167. Makalowski, W., Mitchell, G.A., and Labuda, D. (1994). Alu sequences in the coding regions of mRNA: a source of protein variability. Trends Genet 10, 188-193. Makalowski, W., and Toda, Y. (2007). Modulation of host genes by mammalian transposable elements. Genome Dyn 3, 163-174. Maksakova, I.A., and Mager, D.L. (2005). Transcriptional regulation of early transposon elements, an active family of mouse long terminal repeat retrotransposons. J Virol 79, 1386513874. Maksakova, I.A., Romanish, M.T., Gagnier, L., Dunn, C.A., van de Lagemaat, L.N., and Mager, D.L. (2006). Retroviral elements and their hosts: insertional mutagenesis in the mouse germ line. PLoS Genet 2, e2.  38  Malik, H.S., and Eickbush, T.H. (2001). Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res 11, 1187-1197. Marques, A.C., Dupanloup, I., Vinckenbosch, N., Reymond, A., and Kaessmann, H. (2005). Emergence of young human genes after a burst of retroposition in primates. PLoS Biol 3, e357. Martin, S.L. (2006). The ORF1 Protein Encoded by LINE-1: Structure and Function During L1 Retrotransposition. J Biomed Biotechnol 2006, 45621. Martin, S.L., Bushman, D., Wang, F., Li, P.W., Walker, A., Cummiskey, J., Branciforte, D., and Williams, M.C. (2008). A single amino acid substitution in ORF1 dramatically decreases L1 retrotransposition and provides insight into nucleic acid chaperone activity. Nucleic Acids Res 36, 5845-5854. Martin, S.L., Li, W.L., Furano, A.V., and Boissinot, S. (2005). The structures of mouse and human L1 elements reflect their insertion mechanism. Cytogenet Genome Res 110, 223-228. Matlik, K., Redik, K., and Speek, M. (2006). L1 antisense promoter drives tissue-specific transcription of human genes. J Biomed Biotechnol 2006, 71753. Matsumoto, K., Nakayama, T., Sakai, H., Tanemura, K., Osuga, H., Sato, E., and Ikeda, J.E. (1999). Neuronal apoptosis inhibitory protein (NAIP) may enhance the survival of granulosa cells thus indirectly affecting oocyte survival. Mol Reprod Dev 54, 103-111. McClintock, B. (1953). Induction of Instability at Selected Loci in Maize. Genetics 38, 579-599. McEleny, K.R., Watson, R.W., Coffey, R.N., O'Neill, A.J., and Fitzpatrick, J.M. (2002). Inhibitors of apoptosis proteins in prostate cancer cell lines. Prostate 51, 133-140. Medstrand, P., Landry, J.R., and Mager, D.L. (2001). Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J Biol Chem 276, 1896-1903. Medstrand, P., van de Lagemaat, L.N., Dunn, C.A., Landry, J.R., Svenback, D., and Mager, D.L. (2005). Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res 110, 342-352. Medstrand, P., van de Lagemaat, L.N., and Mager, D.L. (2002). Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12, 14831495. Miao, E.A., Alpuche-Aranda, C.M., Dors, M., Clark, A.E., Bader, M.W., Miller, S.I., and Aderem, A. (2006). Cytoplasmic flagellin activates caspase-1 and secretion of interleukin 1beta via Ipaf. Nat Immunol 7, 569-575. Mitchell, R.S., Beitzel, B.F., Schroder, A.R., Shinn, P., Chen, H., Berry, C.C., Ecker, J.R., and Bushman, F.D. (2004). Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol 2, E234.  39  Molofsky, A.B., Byrne, B.G., Whitfield, N.N., Madigan, C.A., Fuse, E.T., Tateda, K., and Swanson, M.S. (2006). Cytosolic recognition of flagellin by mouse macrophages restricts Legionella pneumophila infection. J Exp Med 203, 1093-1104. Moran, J.V., DeBerardinis, R.J., and Kazazian, H.H., Jr. (1999). Exon shuffling by L1 retrotransposition. Science 283, 1530-1534. Morgan, H.D., Sutherland, H.G., Martin, D.I., and Whitelaw, E. (1999). Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 23, 314-318. Murnane, J.P., and Morales, J.F. (1995). Use of a mammalian interspersed repetitive (MIR) element in the coding and processing sequences of mammalian genes. Nucleic Acids Res 23, 2837-2839. Myers, J.S., Vincent, B.J., Udall, H., Watkins, W.S., Morrish, T.A., Kilroy, G.E., Swergold, G.D., Henke, J., Henke, L., Moran, J.V., et al. (2002). A comprehensive analysis of recently integrated human Ta L1 elements. Am J Hum Genet 71, 312-326. Nakagawa, Y., Hasegawa, M., Kurata, M., Yamamoto, K., Abe, S., Inoue, M., Takemura, T., Hirokawa, K., Suzuki, K., and Kitagawa, M. (2005). Expression of IAP-family proteins in adult acute mixed lineage leukemia (AMLL). Am J Hematol 78, 173-180. Narezkina, A., Taganov, K.D., Litwin, S., Stoyanova, R., Hayashi, J., Seeger, C., Skalka, A.M., and Katz, R.A. (2004). Genome-wide analyses of avian sarcoma virus integration sites. J Virol 78, 11656-11663. Nekrutenko, A., and Li, W.H. (2001). Transposable elements are found in a large number of human protein-coding genes. Trends Genet 17, 619-621. Nigumann, P., Redik, K., Matlik, K., and Speek, M. (2002). Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics 79, 628-634. Nobukuni, T., Kobayashi, M., Omori, A., Ichinose, S., Iwanaga, T., Takahashi, I., Hashimoto, K., Hattori, S., Kaibuchi, K., Miyata, Y., et al. (1997). An Alu-linked repetitive sequence corresponding to 280 amino acids is expressed in a novel bovine protein, but not in its human homologue. J Biol Chem 272, 2801-2807. Nomura, T., Yamasaki, M., Nomura, Y., and Mimata, H. (2005). Expression of the inhibitors of apoptosis proteins in cisplatin-resistant prostate cancer cells. Oncol Rep 14, 993-997. Notarbartolo, M., Cervello, M., Dusonchet, L., and D'Alessandro, N. (2002). NAIP-deltaEx1011: a novel splice variant of the apoptosis inhibitor NAIP differently expressed in drug-sensitive and multidrug-resistant HL60 leukemia cells. Leuk Res 26, 857-862. O'Riordan, M.X., Bauler, L.D., Scott, F.L., and Duckett, C.S. (2008). Inhibitor of apoptosis proteins in eukaryotic evolution and development: a model of thematic conservation. Dev Cell 15, 497-508. Ohno, S. (1970). Evolution by gene duplication (New York, Springer-Verlag).  40  Ohno, S. (1999). Gene duplication and the uniqueness of vertebrate genomes circa 1970-1999. Semin Cell Dev Biol 10, 517-522. Ohshima, K., and Okada, N. (2005). SINEs and LINEs: symbionts of eukaryotic genomes with a common tail. Cytogenet Genome Res 110, 475-490. Orgel, L.E., and Crick, F.H. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604-607. Ostertag, E.M., and Kazazian, H.H., Jr. (2001). Biology of mammalian L1 retrotransposons. Annu Rev Genet 35, 501-538. Pace, J.K., 2nd, Gilbert, C., Clark, M.S., and Feschotte, C. (2008). Repeated horizontal transfer of a DNA transposon in mammals and other tetrapods. Proc Natl Acad Sci U S A 105, 1702317028. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. (2004). Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7, 597-606. Perepelitsa-Belancio, V., and Deininger, P. (2003). RNA truncation by premature polyadenylation attenuates human mobile element activity. Nat Genet 35, 363-366. Perna, N.T., Batzer, M.A., Deininger, P.L., and Stoneking, M. (1992). Alu insertion polymorphism: a new type of marker for human population studies. Hum Biol 64, 641-648. Pickeral, O.K., Makalowski, W., Boguski, M.S., and Boeke, J.D. (2000). Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res 10, 411-415. Pinsker, W., Haring, E., Hagemann, S., and Miller, W.J. (2001). The evolutionary life history of P transposons: from horizontal invaders to domesticated neogenes. Chromosoma 110, 148-158. Polak, P., and Domany, E. (2006). Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genomics 7, 133. Quentin, Y. (1992). Fusion of a free left Alu monomer and a free right Alu monomer at the origin of the Alu family in the primate genomes. Nucleic Acids Res 20, 487-493. Quentin, Y. (1994). Emergence of master sequences in families of retroposons derived from 7sl RNA. Genetica 93, 203-215. Rakyan, V.K., Blewitt, M.E., Druker, R., Preis, J.I., and Whitelaw, E. (2002). Metastable epialleles in mammals. Trends Genet 18, 348-351. Rakyan, V.K., Chong, S., Champ, M.E., Cuthbert, P.C., Morgan, H.D., Luu, K.V., and Whitelaw, E. (2003). Transgenerational inheritance of epigenetic states at the murine Axin(Fu) allele occurs after maternal and paternal transmission. Proc Natl Acad Sci U S A 100, 25382543. Rawn, S.M., and Cross, J.C. (2008). The evolution, regulation, and function of placenta-specific genes. Annu Rev Cell Dev Biol 24, 159-181. 41  Ray, D.A., Pagan, H.J., Thompson, M.L., and Stevens, R.D. (2007). Bats with hATs: evidence for recent DNA transposon activity in genus Myotis. Mol Biol Evol 24, 632-639. Ren, T., Zamboni, D.S., Roy, C.R., Dietrich, W.F., and Vance, R.E. (2006). Flagellin-deficient Legionella mutants evade caspase-1- and Naip5-mediated macrophage immunity. PLoS Pathog 2, e18. Roy, N., Mahadevan, M.S., McLean, M., Shutler, G., Yaraghi, Z., Farahani, R., Baird, S., Besner-Johnston, A., Lefebvre, C., Kang, X., et al. (1995). The gene for neuronal apoptosis inhibitory protein is partially deleted in individuals with spinal muscular atrophy. Cell 80, 167178. Rozmahel, R., Heng, H.H., Duncan, A.M., Shi, X.M., Rommens, J.M., and Tsui, L.C. (1997). Amplification of CFTR exon 9 sequences to multiple locations in the human genome. Genomics 45, 554-561. Sayah, D.M., Sokolskaja, E., Berthoux, L., and Luban, J. (2004). Cyclophilin A retrotransposition into TRIM5 explains owl monkey resistance to HIV-1. Nature 430, 569-573. Scharf, J.M., Damron, D., Frisella, A., Bruno, S., Beggs, A.H., Kunkel, L.M., and Dietrich, W.F. (1996). The mouse region syntenic for human spinal muscular atrophy lies within the Lgn1 critical interval and contains multiple copies of Naip exon 5. Genomics 38, 405-417. Schmid, C.W. (1991). Human Alu subfamilies and their methylation revealed by blot hybridization. Nucleic Acids Res 19, 5613-5617. Schmid, C.W. (1996). Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog Nucleic Acid Res Mol Biol 53, 283-319. Schmutz, J., Martin, J., Terry, A., Couronne, O., Grimwood, J., Lowry, S., Gordon, L.A., Scott, D., Xie, G., Huang, W., et al. (2004). The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268-274. Seidl, R., Bajo, M., Bohm, K., LaCasse, E.C., MacKenzie, A.E., Cairns, N., and Lubec, G. (1999). Neuronal apoptosis inhibitory protein (NAIP)-like immunoreactivity in brains of adult patients with Down syndrome. J Neural Transm Suppl 57, 283-291. Shankar, R., Grover, D., Brahmachari, S.K., and Mukerji, M. (2004). Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol 4, 37. Shin, S.W., Lee, M.Y., Kwon, G.Y., Park, J.W., Yoo, M., Kim, S.K., Oh, T.H., and Choe, B.K. (2003). Cloning and characterization of rat neuronal apoptosis inhibitory protein cDNA. Neurochem Int 42, 481-491. Singer, S.S., Mannel, D.N., Hehlgans, T., Brosius, J., and Schmitz, J. (2004). From "junk" to gene: curriculum vitae of a primate receptor isoform gene. J Mol Biol 341, 883-886. Smit, A.F. (1996). The origin of interspersed repeats in the human genome. Curr Opin Genet Dev 6, 743-748. 42  Smit, A.F. (1999). Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9, 657-663. Sorek, R., Ast, G., and Graur, D. (2002). Alu-containing exons are alternatively spliced. Genome Res 12, 1060-1067. Speek, M. (2001). Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol Cell Biol 21, 1973-1985. Stankiewicz, P., Shaw, C.J., Withers, M., Inoue, K., and Lupski, J.R. (2004). Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res 14, 2209-2220. Tang, W., Gunn, T.M., McLaughlin, D.F., Barsh, G.S., Schlossman, S.F., and Duke-Cohan, J.S. (2000). Secreted and membrane attractin result from alternative splicing of the human ATRN gene. Proc Natl Acad Sci U S A 97, 6025-6030. Tchenio, T., Casella, J.F., and Heidmann, T. (2000). Members of the SRY family regulate the human LINE retrotransposons. Nucleic Acids Res 28, 411-415. Ting, J.P., Lovering, R.C., Alnemri, E.S., Bertin, J., Boss, J.M., Davis, B.K., Flavell, R.A., Girardin, S.E., Godzik, A., Harton, J.A., et al. (2008). The NLR gene family: a standard nomenclature. Immunity 28, 285-287. Tomilin, N.V. (2008). Regulation of mammalian gene expression by retroelements and noncoding tandem repeats. Bioessays 30, 338-348. Tran, V.K., Sasongko, T.H., Hong, D.D., Hoan, N.T., Dung, V.C., Lee, M.J., Gunadi, Takeshima, Y., Matsuo, M., and Nishio, H. (2008). SMN2 and NAIP gene dosages in Vietnamese patients with spinal muscular atrophy. Pediatr Int 50, 346-351. Turner, G., Barbulescu, M., Su, M., Jensen-Seaman, M.I., Kidd, K.K., and Lenz, J. (2001). Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr Biol 11, 1531-1535. Ullu, E., and Weiner, A.M. (1985). Upstream sequences modulate the internal promoter of the human 7SL RNA gene. Nature 318, 371-374. van de Lagemaat, L.N., Landry, J.R., Mager, D.L., and Medstrand, P. (2003). Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet 19, 530-536. van de Lagemaat, L.N., Medstrand, P., and Mager, D.L. (2006). Multiple effects govern endogenous retrovirus survival patterns in human gene introns. Genome Biol 7, R86. Vaux, D.L., and Silke, J. (2005). IAPs, RINGs and ubiquitylation. Nat Rev Mol Cell Biol 6, 287297. Villesen, P., Aagaard, L., Wiuf, C., and Pedersen, F.S. (2004). Identification of endogenous retroviral reading frames in the human genome. Retrovirology 1, 32. 43  Vinckenbosch, N., Dupanloup, I., and Kaessmann, H. (2006). Evolutionary fate of retroposed gene copies in the human genome. Proc Natl Acad Sci U S A 103, 3220-3225. Vinzing, M., Eitel, J., Lippmann, J., Hocke, A.C., Zahlten, J., Slevogt, H., N'Guessan P, D., Gunther, S., Schmeck, B., Hippenstiel, S., et al. (2008). NAIP and Ipaf control Legionella pneumophila replication in human cells. J Immunol 180, 6808-6815. Vogt, V.M. (1997). Retroviral virions and genomes. In Retroviruses, J.M. Coffin, S.H. Hughes, and H.E. Varmus, eds. (Cold Spring Harbor, Cold Spring Harbor Laboratory Press), pp. 27-69. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476. Wang, H., Xing, J., Grover, D., Hedges, D.J., Han, K., Walker, J.A., and Batzer, M.A. (2005). SVA elements: a hominid-specific retroposon family. J Mol Biol 354, 994-1007. Wang, T., Zeng, J., Lowe, C.B., Sellers, R.G., Salama, S.R., Yang, M., Burgess, S.M., Brachmann, R.K., and Haussler, D. (2007). Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci U S A 104, 18613-18618. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562. Whitelaw, E., and Martin, D.I. (2001). Retrotransposons as epigenetic mediators of phenotypic variation in mammals. Nat Genet 27, 361-365. Wilmanski, J.M., Petnicki-Ocwieja, T., and Kobayashi, K.S. (2008). NLR proteins: integral members of innate immunity and mediators of inflammatory diseases. J Leukoc Biol 83, 13-30. Wright, E.K., Goodart, S.A., Growney, J.D., Hadinoto, V., Endrizzi, M.G., Long, E.M., Sadigh, K., Abney, A.L., Bernstein-Hanley, I., and Dietrich, W.F. (2003). Naip5 affects host susceptibility to the intracellular pathogen Legionella pneumophila. Curr Biol 13, 27-36. Wu, X., Li, Y., Crise, B., and Burgess, S.M. (2003). Transcription start regions in the human genome are favored targets for MLV integration. Science 300, 1749-1751. Xing, J., Wang, H., Belancio, V.P., Cordaux, R., Deininger, P.L., and Batzer, M.A. (2006). Emergence of primate genes by retrotransposon-mediated sequence transduction. Proc Natl Acad Sci U S A 103, 17608-17613. Xu, D.G., Crocker, S.J., Doucet, J.P., St-Jean, M., Tamai, K., Hakim, A.M., Ikeda, J.E., Liston, P., Thompson, C.S., Korneluk, R.G., et al. (1997a). Elevation of neuronal expression of NAIP reduces ischemic damage in the rat hippocampus. Nat Med 3, 997-1004. Xu, D.G., Korneluk, R.G., Tamai, K., Wigle, N., Hakim, A., Mackenzie, A., and Robertson, G.S. (1997b). Distribution of neuronal apoptosis inhibitory protein-like immunoreactivity in the rat central nervous system. J Comp Neurol 382, 247-259. 44  Xu, M., Okada, T., Sakai, H., Miyamoto, N., Yanagisawa, Y., MacKenzie, A.E., Hadano, S., and Ikeda, J.E. (2002). Functional human NAIP promoter transcription regulatory elements for the NAIP and PsiNAIP genes. Biochim Biophys Acta 1574, 35-50. Yamamoto, K., Abe, S., Nakagawa, Y., Suzuki, K., Hasegawa, M., Inoue, M., Kurata, M., Hirokawa, K., and Kitagawa, M. (2004). Expression of IAP family proteins in myelodysplastic syndromes transforming to overt leukemia. Leuk Res 28, 1203-1211. Yamamoto, K., Sakai, H., Hadano, S., Gondo, Y., and Ikeda, J.E. (1999). Identification of two distinct transcripts for the neuronal apoptosis inhibitory protein gene. Biochem Biophys Res Commun 264, 998-1006. Yang, N., Zhang, L., Zhang, Y., and Kazazian, H.H., Jr. (2003). An important role for RUNX3 in human L1 transcription and retrotransposition. Nucleic Acids Res 31, 4929-4940. Yaraghi, Z., Korneluk, R.G., and MacKenzie, A. (1998). Cloning and characterization of the multiple murine homologues of NAIP (neuronal apoptosis inhibitory protein). Genomics 51, 107-113. Zaiss, D.M., and Kloetzel, P.M. (1999). A second gene encoding the mouse proteasome activator PA28beta subunit is part of a LINE1 element and is driven by a LINE1 promoter. J Mol Biol 287, 829-835. Zamboni, D.S., Kobayashi, K.S., Kohlsdorf, T., Ogura, Y., Long, E.M., Vance, R.E., Kuida, K., Mariathasan, S., Dixit, V.M., Flavell, R.A., et al. (2006). The Birc1e cytosolic patternrecognition receptor contributes to the detection and control of Legionella pneumophila infection. Nat Immunol 7, 318-325. Zemojtel, T., Kielbasa, S.M., Arndt, P.F., Chung, H.R., and Vingron, M. (2009). Methylation and deamination of CpGs generate p53-binding sites on a genomic scale. Trends Genet 25, 6366. Zemojtel, T., Penzkofer, T., Schultz, J., Dandekar, T., Badge, R., and Vingron, M. (2007). Exonization of active mouse L1s: a driver of transcriptome evolution? BMC Genomics 8, 392. Zhang, Y., Maksakova, I.A., Gagnier, L., van de Lagemaat, L.N., and Mager, D.L. (2008). Genome-wide assessments reveal extremely high levels of polymorphism of two active families of mouse endogenous retroviral elements. PLoS Genet 4, e1000007. Zhou, L., Mitra, R., Atkinson, P.W., Hickman, A.B., Dyda, F., and Craig, N.L. (2004). Transposition of hAT elements links transposable elements and V(D)J recombination. Nature 432, 995-1001.  45  CHAPTER 2 REPEATED RECRUITMENT OF LTR RETROTRANSPOSONS AS PROMOTERS BY THE ANTI-APOPTOTIC LOCUS NAIP DURING MAMMALIAN EVOLUTION1  _______________________________________ 1 A version of this chapter has been published. Romanish, M.T., Lock, W.M., van de Lagemaat, L.N., Dunn, C.A., Mager, D.L. (2007). Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genetics 3: e10. WML performed bioinformatics analysis discussed in 2.3.6. LNvdL identified NAIP as a candidate LTR-gene chimera. CAD served an advisory role. 46  2.1 Introduction It is apparent that human genes achieve incredible diversity through their alternative regulation (Carninci et al., 2006; Johnson et al., 2003; Kwan et al., 2008; Wang et al., 2008). Transposable elements, particularly the RNA pol II regulatory motif-bearing LTRs and LINEs, represent ubiquitous transcriptional regulatory modules in host genomes, as reviewed in Chapter 1. While their control over the expression of numerous cellular genes is known, this study documents an extremely unusual case of LTR domestication by a single gene, neuronal apoptosis inhibitory protein (NAIP). The specific involvement of NAIP in a variety of diseases, coupled with its cellular role as an IAP, has prompted an in-depth analysis concerning its transcriptional regulation. In this remarkable study, the NAIP genes in human and mice are shown to have independently domesticated multiple LTR elements as promoters since their divergence from a common ancestor. Furthermore, it is also demonstrated that the 5’ flanking regions of all IAP genes are enriched for LTR-derived sequences compared to all other genes. To account for these findings, we offer several possible scenarios, including the suggestion that utilization of LTR promoters by NAIP may have been evolutionarily favored due to this gene’s anti-apoptotic function. Notwithstanding, these data substantiate the earlier view that rapidly evolving genes such as those involved in immunity or cellular-defense are more likely to use TE promoters (van de Lagemaat et al., 2003).  47  2.2 Materials and methods  2.2.1 RNA isolation Primary mouse tissue samples were dissected from healthy adult male C57BL/6J (B6) mice, and preserved in RNA Later (Ambion). All samples were processed using TRIzol (Invitrogen), except peripheral blood leukocytes (PBLs) for which the QIAamp RNA Blood Mini Kit was used (Qiagen). B6 testis, Sprague Dawley rat, and all human RNA samples were purchased from Clontech, with the exception of primary human blood and placenta samples. These were obtained from Dr. C. Eaves (Terry Fox Laboratory) and Dr. P. Medstrand (Lund University, Sweden), respectively.  2.2.2 5’ Rapid Amplification of cDNA Ends (5’ RACE) 5’ RACE analysis of human blood, colon, placenta and testis, B6 liver and placenta, and Sprague-Dawley spleen RNA was performed using the FirstChoice RLM-RACE kit (Ambion). Manufacturer’s recommendations were followed, but on occasion several kit components [calf intestinal phosphatase (CIP), RNA ligase, and MuLV reverse transcriptase (RT)] were substituted by CIP (NEBiolabs), RNA ligase (NEBiolabs), and SSIII RT (Invitrogen) laboratory stocks. Gene-specific reverse primers and reaction conditions are summarized in Table 2.S2.  2.2.3 Genomic PCR and generation of constructs Genomic DNA (gDNA) was isolated from B6 liver using DNAzol (Invitrogen) as outlined by the manufacturer. Only the ORR1E LTR of each Naip copy was amplified using Platinum Taq HIFI (Invitrogen) as outlined by the manufacturer. mNaipe/f LTR constructs were 5’ truncated by ~100 bp due to ~3 kb of intervening LINE1 sequence, which we opted to not include. PCR reaction conditions and primers used for amplification of fragments are listed in 48  Table 2.S2. Primers incorporated AflII and HindIII restriction enzyme recognition sequences to facilitate directional cloning into a modified pGL3B (Promega) vector. All constructs were sequenced to verify their fidelity. Our pGL3B promoterless vector is a slight modification of the manufacturer’s and has been published elsewhere (Wilhelm et al., 2003). Briefly, the multiple cloning site was replaced with a series of strong polyadenylation signals, to reduce background luciferase expression.  2.2.4 Cell culture and luciferase assays All cell lines assayed were B6-derived: MS1 (pancreatic), EL4 and RMA-E3 (lymphoid). Cells were cultured in DMEM (Stemcell Technologies) supplemented with 10% fetal bovine serum (Invitrogen) and grown at 37ºC under 5% CO2. Cell stocks were maintained in penicillin/streptomycin, but all transfection experiments were carried out in its absence. Prior to transfection, suspension cells (EL4 and RMA-E3) were seeded at 500,000 cells per well and adherent cells (MS1) at 50,000 cells, in 24 well plates. Lipofectamine (Invitrogen) and Lipofectamine 2000 (Invitrogen) delivered our constructs to adherent and suspension cells, respectively, according to manufacturer’s guidelines. Approximately 24 hours post-transfection, the cells were washed with PBS (Stemcell Technologies), processed, and analyzed for firefly and Renilla expression using the Dual Luciferase Reporter Assay System (Promega). All values were standardized to the Renilla luciferase control vector to control for transfection efficiency, then to the modified promoterless pGL3B construct.  2.2.5 cDNA synthesis and RT-PCR Initial experiments used SuperscriptII (Invitrogen) reverse transcribed RNA as described elsewhere (Landry et al., 2002). These findings were confirmed by SuperscriptIII (Invitrogen) random hexamer-primed reverse transcribed RNA according to the manufacturer’s 49  recommendations. cDNA amplification was carried out using Platinum Taq (Invitrogen) over 35 cycles. All primers and their associated annealing temperatures and extension times are summarized in Table 2.S2.  2.2.6 Quantitative RT-PCR The cDNA used for quantitative RT-PCR with Power SYBR Green PCR Master Mix (Applied Biosystems) in the ABI 7500 Real Time PCR System (Applied Biosystems) was prepared as above. Stock primers were at a 10 µM concentration and they were determined to work equally efficiently, within a certain range of template dilution, using a standard curve. Consequently, the comparative CT method was used for quantification of target (open reading frame and LTR-derived) versus a GAPDH endogenous control in testis and kidney. Each experiment was conducted three times, with at least two replicates per plate, and the cycling parameters were as follow: 50 ˚C – 2 minutes; 95 ˚C – 10 minutes; 95 ˚C – 15 seconds (40X); 60 ˚C – 1 minute. At the end of each run, dissociation curves were generated, which indicated the specificity of amplification, also verified by RT-PCR (data not shown). Due to the difficulty of primer design posed by splicing variants (Figures 2.3a and 2.S1), we were able to quantify only one of the HERV-P LTR-promoted forms (topmost band, Figure 2.3a and top form in Figure 2.S1a), and estimated that it reflected half of the total LTR-derived transcripts. The value obtained was therefore doubled to deduce the total LTR-derived transcripts and this doubling is reflected in Figure 2.1b. Real-time primers are listed in Table 2.S2, and they all begin with the prefix ‘q’.  2.2.7 Sequencing PCR products and reporter constructs were cloned into the T-vector (Promega) or our modified pGL3B (Promega) respectively, and sequenced at the McGill University sequencing 50  facility. Sequencing verified that primers selectively amplified target genes and not their paralogues, with the exception of the lower band in mNaipe and mNaipf open reading frame RTPCR panels (Figure 2.4b), identified as mNaipb. All sequences were stored and analyzed in the SDSC Biology Workbench (www.workbench.sdsc.edu/), offering a suite of analytical tools.  2.2.8 Dotplots DNA sequence surrounding the LTR promoters of mouse and rat Naip (mNaip and rNaip) were obtained from the UCSC Genome Browser (http://genome.ucsc.edu/, mm8 and rn3, respectively). Comparative analysis of genomic sequence was completed using the web-based jdotter (www.athena.bioc.uvic.ca/workbench.php?tool=jdotter&db=). All dotplots were prepared using a 25 bp window and the greymap tool was iteratively adjusted to distinguish true lines of homology from background. Analyzed sequences were manually annotated across their lengths.  2.2.9 Analysis of retroelements in 5’ flanking gene regions Overall base pair coverage by retroelements (LTRs, LINEs, and SINEs) in a 12.5 kb window (10 kb upstream and 2.5 kb downstream) surrounding the 5' terminus of the longest annotated transcript of IAP family genes (delineated in EnsEMBL-v37) was determined. Annotation files generated by RepeatMasker (v3.1.4) from the human genome (hg17) and the mouse genome (mm7) were used to obtain pertinent attributes for all repeat elements. Base pair coverage by different retroelement classes among human and mouse IAP genes (8 in human, BIRC1-8; 8 in mouse, mNaipa/b and Birc2-7) was compared to 1,000 randomly selected comparable-sized sets of genes. The mNaipe/f genes were excluded because they were recently duplicated from an mNaipb-like gene. Numbers of LTR insertions in the window for the human and mouse IAP genes (manually checked for accuracy) are shown in Table 2.S1. However, because indels and rearrangements of ancient TEs hampered accurate automated tabulation of 51  numbers of insertion events for the random sets of genes, we instead determined total base pair coverage by the three retroelement classes upstream of the IAP genes and random sets of genes. For endogenous retroviral-like elements, we considered the LTR part only, because LTRs are known to harbour regulatory signals. We therefore excluded sequences annotated as ERV internal sequence, which are annotated in human with names including the text strings “ERVL”, “HERV”, “-int”, “Harlequin”, and “HUERS-“. In mouse, internal sequences were identified by names including the text strings “_I”, “-int”, and “ERV”.  Names and accession numbers  (human/mouse) for the IAP genes used were: NAIP or BIRC1 (U19251/mNaipa-AF135491, mNaipb-AF135490);  cIAP1  or  BIRC2  (BX647978/U88909);  (AF070674/U88908);  XIAP  or  BIRC4  (U32974/U88990);  (CR612752/W97263);  bruce  or  BIRC6  cIAP2  or  BIRC3  survivin  or  BIRC5  or  BIRC7  (AF265555/Y17267);  livin  (AY358835/BC107260); and TsIAP or BIRC8 (AF420440) [no ESTs or cDNAs have been reported for mouse TsIAP, despite its presence on chromosome 7, so it was omitted from the mouse analysis].  52  2.3 Results  2.3.1 Transcription of mammalian NAIP genes initiates within LTRs In a screen of mouse and human gene expression databases similar to a previous study (van de Lagemaat et al., 2003), we identified NAIP as an example of a gene with transcripts initiating within an LTR sequence, indicating potential use of the LTR as a promoter. Surprisingly, EST and RefSeq data indicated that human and mouse had recruited completely unrelated LTRs as promoters for these orthologues. The published human NAIP transcription start site (TSS) reported by Xu et al. (Xu et al., 2002) overlies a MER21C solitary LTR, which is itself interrupted by a HUERS-P3/LTR-9 element as annotated by Repbase (see Figure 2.1a) (Jurka et al., 2005). This latter ERV family has previously been termed HERV-P (Yi et al., 2007), and we will use this nomenclature throughout. While transcriptional regulation of the mouse Naip genes has not been studied in great detail, database transcripts initiate from a solitary ORR1E LTR of the MaLR superfamily (Smit, 1993). The potential usage of different LTRs in regulation of mammalian gene orthologues has not been documented previously and this fact prompted a further investigation to confirm and extend our bioinformatics screens. Although the human NAIP TSS and promoter active in the THP1 leukemic cell line had previously been characterized (Xu et al., 2002) (form iii, Figure 1a), the LTR nature of the underlying sequence had escaped notice. We screened primary RNA samples from human blood, colon, placenta and testis by 5’ RACE and could not confirm this TSS in these tissues. As Xu et al. did, we also attempted to localize a 5’ start site in the region by RT-PCR using successively tiled primers along the length of the MER21C/HERV-P, and extending beyond its 5’ flank, combined with a common reverse primer. This analysis of blood, placenta, and testis cDNA yielded numerous products, due to the repetitive nature of the target sequence. Using Southern blotting, we resolved specific products for all primer sets across the MER21C/HERV-P complex. 53  In addition, one primer upstream of this complex, located between the adjacent MIR SINE and nearby AluSc SINE also gave a product of the expected size, but a primer upstream of the AluSc did not (see Figure 2.1a and data not shown). These data indicate that a NAIP promoter may exist which incorporates SINE and LTR sequences into a repeat rich 5’ UTR. While 5’ RACE was unsuccessful in confirming the previously published start site (form iii, Figure 2.1a), evidence for at least two other promoters was discovered. The principal TSS in all tissues tested (form i, Figure 2.1a), also strongly supported by EST data, lies within the third exon of the published cDNA from THP1 cells (form iii, Figure 2.1a), suggesting that the major promoter is upstream of this TSS. In testis, we identified two other closely spaced TSSs. Remarkably, they lie within the same MER21C/HERV-P complex but are located in the 3’ LTR of the HERV-P element, suggesting use of this LTR as an alternative promoter (form ii, Figure 2.1a). One of these HERV-P TSSs is supported by a testis EST (DB097870). Using quantitative RT-PCR, we determined that this HERV-P LTR promoter is responsible for ~12% of total NAIP transcripts in normal testis but none were detectable in kidney (Figure 2.1b). We also confirmed by RT-PCR that a full-length transcript encoding an intact NAIP open reading frame (ORF) is produced from the LTR promoter (data not shown). Various transcriptional regulatory features such as a putative TATA box, initiator and downstream promoter elements (Butler and Kadonaga, 2002) were identified in the sequence underlying sites of LTR and non-LTR NAIP transcription (Figure 2.1c,d). Interestingly, the 5’ most TSS within the HERV-P LTR overlies an initiator element (Butler and Kadonaga, 2002) that overlaps the putative CCAAT box previously detected in other members of this family of LTRs (Kroger and Horak, 1987). However, the putative TATA box identified in that study, while present in our example, does not appear to be used as it is located downstream of the TSSs identified by 5’ RACE. These features and the extents of all 5’ RACE clones are shown in Figure 1c. To verify our identified NAIP TSSs, we checked the “Cap  54  Analysis of Gene Expression” (CAGE) database (Carninci et al., 2006) for mapped TSSs for the human NAIP gene but none were found.  Figure 2.1. Contribution of LTR promoters to human NAIP transcription and a summary of 5’ RACE results. (A) Representation of 5’ region of human NAIP gene. Transcription initiates at arrows situated above the underlying genomic DNA, with representative RNAs pictured above. Black boxes represent exons in DNA and RNA forms. White boxes represent a solitary MER21C LTR into which a HERV-P element has inserted (grey box). Sections of the HERV-P labeled 5’ and 3’ represent the 5’ and 3’ LTRs of this partly deleted ERV. Both the MER21C and the HERV-P are oriented in the same transcriptional direction as the NAIP gene. The boxes to the left of the MER21C denote an AluSc SINE and an MIR SINE (unlabeled). Three TSSs for human NAIP have been reported or were identified here: isoform i is found in all tissues tested, while ii represents the testis-specific HERV-P start site and iii represents the published TSS determined in the THP1 leukemic cell line (Xu et al., 2002). (B) Quantitative real-time RT-PCR analysis of human testis and kidney cDNA to determine contribution of the HERV-P LTR promoter to total NAIP transcription. Total transcript levels were determined using primers that amplify all of the most prevalent transcript forms and LTR-driven transcripts were determined using one primer in the LTR (see Figure 2.S1a for locations of primers and Materials and Methods for details). Expression levels are normalized to GAPDH and represented relative to total NAIP transcript levels in testis. Assays were carried out in duplicate and repeated three times in testis and twice in kidney. (C) Partial sequence of the HERV-P element (5’ end corresponds to chr.5: 70,355,179 of the human [hg18] draft sequence) underlying testis-specific TSSs of NAIP. The numbers of sequenced 5’ RACE clones aligning to particular TSSs are shown above the sequence. The putative TATA box identified previously in HERV-P LTRs (Kroger and Horak, 1987) is at the end of the sequence shown. (D) Underlying sequence and TSSs determined for the non-LTR promoter (chr.5: 70,352,387) in blood, liver, placenta and testis. Lowercase letters distinguish intron/exon boundary. Two 5’ RACE clones aligned upstream of the intronic sequence shown. Numbers above boldfaced nucleotides indicate sites of transcription and the number of 5’ RACE clones that align to each TSS. Underlines and overlines indicate putative initiator elements (Inr) and downstream promoter elements (DPE), respectively (Butler and Kadonaga, 2002). Boxed sequence represents a putative TATA box. Full characterization of human UTRs can be found in Figure 2.S1.  55  As mentioned above, database screens indicated that transcription of the mouse Naip genes initiates within an ORR1E LTR common to all mouse copies. We conducted 5’ RACE on primary B6 colon and liver RNA using primers specific for each NAIP copy (mNaipa/b/c/e/f) (Growney and Dietrich, 2000). No evidence of mNaipc transcription was detected and it may represent a pseudogene detected through genomic Southern blots (Growney and Dietrich, 2000). For all other mouse Naip genes, the major TSSs mapped within the common ORR1E LTR, confirming the database screens (Figure 2.2a). Due to the conserved position of a motif resembling a TATA box, sequence identity of flanking nucleotides, and localization of most TSSs 25-32 bp downstream of the TATA motif for all mNaip copies, these LTRs appear to be typical TATA box promoters (Figure 2.2b). The mNaipb gene is the only mouse gene with more than one CAGE tag and two clusters of these tags correspond very well to our identified TSSs (Figure 2.2b). This 5’RACE analysis also uncovered two alternative promoters for some of the mouse genes, one of which is a second LTR. The progenitor of the mNaipe/f paralogues was targeted by an MTC LTR (Smit, 1993), immediately 5’ of the first coding exon, prior to the duplication that created these two genes (Figure 2.2a and 2.2c). We found unique TSSs for each of these genes mapping within this LTR, indicating its use as an alternative promoter. Finally, a minority of mNaipb transcripts initiate from a non-LTR promoter downstream of the initiation codon (Figure 2.2d), but within the first coding exon. The putative novel protein deriving from this isoform (not shown in Figure 2.2a) could potentially utilize a downstream initiation codon, resulting in an N-terminal truncated peptide encoding only the third BIR domain followed by the NBS and LRR motifs. Positions of 5’ RACE clones as well as surrounding transcriptional regulatory features are summarized in Figures 2.2b, c, and d. Unfortunately, MaLR LTRs have not been characterized for their regulatory signals; therefore, we could not compare our results to other functional studies.  56  Figure 2.2. Contribution of LTR promoters to mouse and rat Naip transcription and a summary of 5’ RACE results. (A) Representation of 5’ region of rodent Naip genes. Transcription initiates at arrows situated above the underlying genomic DNA, with representative RNAs pictured above. Grey shaded boxes represent the solitary LTR insertions, and black boxes represent exons in DNA and RNA forms. Mouse and rat Naip transcription predominately initiates in ORR1E LTRs. mNaipe and mNaipf have an MTC LTR (dashed grey box) and ~3kb of L1_Mus1 LINE1 sequence has integrated into the ORR1E LTRs associated with these two genes, shown by a dashed white box. The rNaip2 ORR1E LTR has also been interrupted by an independent insertion of 300 bp of Lx2A1 LINE1, shown by solid white box. (B) Partial alignment of the rodent ORR1E LTRs associated with Naip transcription. The 5’ end of the sequences shown correspond to the following coordinates in the mouse (mm8) and rat (rn4) draft sequences. (mNaipa=chr.13:101,553,198; mNaipb=chr.13:101,302,420; mNaipe=chr.13:101,347,641; mNaipf=chr.13:101,418,005; rNaip1=chr.2:31,268,656; rNaip2=chr.2:31,204,793) (C) Partial alignment of the mNaipe/f MTC alternative promoters. (mNaipe=chr.13:101,346,591; mNaipf=chr.13:101,416,943) (D) Genomic sequence surrounding the mNaipb non-LTR promoter (mNaipb=chr.13:101,289,682). Numbers above boldfaced nucleotides indicate sites of transcription initiation and the number of 5’ RACE clones obtained that align to each TSS. Asterisks denote sites of transcription that are supported by >1 CAGE tag (Carninci et al., 2006). A few mNaipe clones aligned beyond the boundaries of the ORR1E sequence shown. Underlines indicate putative Inr elements and boxed sequence represents putative TATA boxes. Full characterization of mouse UTRs can be found in Figure 2.S2.  57  Very little is known about NAIP transcription in rat and only a partial cDNA (AF361881) has been deposited in the database (Shin et al., 2003). However, ECGene gene prediction software (UCSC Genome Browser) suggests that two tandem copies exist, which we have termed rNaip1 and rNaip2. Based on these predictions, reverse primers were designed and 5’ RACE was carried out on rat spleen RNA. This analysis confirmed expression of both rat genes in the spleen and found that each initiates within an ORR1E LTR, analogous to the mouse genes (Figure 2.2a). Figure 2.2b aligns the mouse and rat ORR1E LTR regions encompassing the 5’ termini of all RACE clones and shows putative regulatory features.  2.3.2 Tissue distribution of NAIP expression To better understand the breadth of use of the human, mouse, and rat LTR promoters, we screened a broad panel of tissues by RT-PCR. Two sets of primers were used: one set selectively amplified LTR-derived transcripts, and the other set spanned protein coding exons to measure total gene expression (including transcripts deriving from alternative promoters). In human, constitutive expression of the NAIP coding region was observed in all tissues screened (Figure 2.3a-panel O). Using primers specific for the HERV-P-initiated form (form ii, Figure 2.1a), we detected transcripts in testis, as expected, and a low level in prostate, but in no other tissues (Figure 2.3a-panel Lii). Interestingly, the HERV-P family in general has been shown to be expressed in testis, prostate, and brain (Yi et al., 2007). Using primers specific for the transcripts previously characterized by Roy et al. (Roy et al., 1995) and Xu et al. (Xu et al., 2002) (form iii, Figure 2.1a), we found only very faint signals in blood, lung, and testis (Figure 2.3a-panel Liii). Due to the requirement of one primer annealing to repetitive DNA, we verified the identity of all PCR products by sequencing. Several alternative exons deriving from repetitive DNA were discovered in both the UTR and ORF and are summarized in Figure 2.S1.  58  Figure 2.3. Transcriptional profile of human (A), mouse (B), and rat (C) NAIP across the indicated primary tissues. Primers selective for LTR-derived transcripts (L) or coding sequence (O) determined the breadth of LTR promoter use in all tissues in all organisms. In part A, L (form iii) primers were specific for the MER21C LTR transcribed form and L(form ii) primers were specific for the HERV-P form. A GAPDH control is shown at the bottom of each panel.  Similar RT-PCRs were also performed for the individual mouse and rat gene copies across numerous tissues. Due to the very high overall sequence identity of these genes, the specificity of RT-PCR products was confirmed by sequencing. In all cases, the pattern of expression for the ORR1E-driven transcript forms was very similar to the pattern obtained using primers within coding regions, indicating that the ORR1E LTRs are the major Naip promoters 59  (Figure 2.3b and 2.3c). We verified the mNaip TSSs by RT-PCR with primers upstream of the putative TATA boxes and, as expected, observed no RT-PCR products (data not shown). A panel including more mouse tissues, with respect to the one shown (Figure 2.3b), also showed a very similar pattern of expression using the different primer sets (data not shown). Various splice isoforms identified among the mouse copies also incorporate exons deriving from both repetitive and non-repetitive DNA, summarized in Figure 2.S2.  2.3.3 Promoter activity of the ORR1E LTRs In other reported cases of LTRs acting as promoters for cellular genes, the LTR has been a minor or tissue-specific promoter (Leib-Mosch et al., 2005; Medstrand et al., 2005). The fact that the rodent Naip genes appear to employ an LTR as a primary constitutive promoter is therefore highly unusual. To confirm that mouse ORR1E LTRs possess promoter activity, reporter gene assays were performed. Constructs of the ORR1E LTRs for each mouse copy were tested in MS1, EL4, and RMA-E3 B6 cell lines. Although the scale of luciferase activity varied among cell lines, the same general trends were observed (Figure 2.4, data not shown). All tested constructs showed marked increases over a promoterless control, and the mNaipa and mNaipb LTR constructs were comparable in activity to the SV40 promoter. The mNaipe and mNaipf ORR1E LTR constructs had lower promoter activity but were also 5’ truncated by ~100bp because we did not include any of the intervening LINE1 sequence disrupting these ORRIE copies in our constructs. The fact that these truncated constructs have lower promoter activity could indicate the presence of positive regulatory element(s) within the 5’ terminus of these ORR1E LTRs, consistent with typical retroviral LTRs (Majors, 1990). Subtle sequence differences also play a role in the different promoter activities since the highly similar mNaipe and mNaipf LTRs (97% identical) differ in promoter activity (Figure 2.4).  60  Figure 2.4. Promoter activity of the mNaip LTRs. The ORR1E LTRs for each copy were cloned into a modified pGL3B vector and tested for luciferase activity in the MS1 cell line. pGL3B and pGL3P, containing a SV40 promoter, were used as negative and positive controls, respectively. Luciferase activity was normalized relative to the co-transfected Renilla luciferase control and then to pGL3B to demonstrate fold activation. Each bar represents the mean of at least four independent transfections ± SEM.  2.3.4 Rapid evolution of the NAIP promoter regions A likely evolutionary scheme to explain association of the LTR elements with mammalian NAIP genes is shown in Figure 2.5. The MER21C and HERV-P elements must have inserted upstream of the ancestral primate NAIP gene at least 40 million years ago since both are present in Old World (human, chimpanzee, Rhesus monkey) and New World (marmoset) primates, according to genome database comparisons (data not shown). The most probable scenario to explain the presence of ORR1E LTRs upstream of all rodent Naip genes is that the element inserted upstream of the ancestral rodent gene and then was included in subsequent duplication events involving the gene. At a later stage, the mNaipe/f progenitor acquired an MTC LTR (Figure 2.5). Interestingly, alignments of the four mouse and two rat ORR1E LTRs reveal that mNaipb/e/f and rNaip2 are ~85% identical to each other, and that a similar level of identity exists between mNaipa and rNaip1. In contrast, mNaipb/e/f:mNaipa and rNaip2:rNaip1 LTR copies are less similar to each other, exhibiting 60-65% identity, an unusual finding considering that a similarly sized, repeat-free, non-coding segment of intron 8 from rNaip1/2 and mNaipa/b 61  exhibits nucleotide identity on the order of 90% among all copies. Moreover, comparisons of the various rodent Naip gene-coding regions (rNaip 1 and 2 and mNaipa and b) also give levels of nucleotide identity of ~90% (data not shown) and do not clearly distinguish orthologous gene pairs. These data indicate that gene conversion events have homogenized the genomic sequence encoding Naip, obscuring the evolutionary relationships of intronic and coding regions. While we assume that the ORR1E LTRs associated with these genes derive from a single ancestral insertion, we also addressed by phylogenetics the less likely possibility that the present LTRgene arrangements arose by independent insertion of different ORR1E LTRs into progenitors of mNaipa/rNaip1 and mNaipb/rNaip2. Unfortunately, the age and divergence of these and other MaLRs, coupled with extensive genomic rearrangements in the region, hindered phylogenetic analyses and comparisons of flanking regions. However, the rodent Naip ORR1Es are more similar to one another, than to others present in either genome, supporting the premise that they derive from one original insertion.  Figure 2.5. Association of LTR elements with NAIP through mammalian evolution. A single NAIP progenitor was present in the last common ancestor of primates and rodents. Following the primate/rodent split, NAIP was independently targeted by multiple lineage-specific LTRs. In human, NAIP is part of a large inverted duplication but the centromeric copy is a pseudogene. In rodents, this locus duplicated prior to mouse-rat divergence. In mouse, Naip has undergone further expansion, where the two youngest copies, mNaipe and f, acquired the MTC LTR.  62  While segments of the ORR1E elements have been retained, their genomic environments appear to have been subjected to repeated disruption by rearrangements and other TE insertions. This is illustrated in Figure 2.6 in which DNA sequences surrounding each LTR are compared using dot plots. This analysis demonstrates that the 5’ regions flanking mNaipa:rNaip1 and mNaipb:rNaip2 are orthologous as the lines of homology are more robust than between reciprocal dotplots. This agrees with the sequence comparisons of the individual LTRs. All combinations of dotplots comparing sequence surrounding the ORR1E LTRs of rodent Naip paralogues revealed a line of homology beginning near the annotated start of the LTRs and extending to a common point ~150 bp beyond the annotated ends, with no other significant similarity in the regions. It would seem that only parts of the LTR and the flanking ~150 bp region have been retained amidst rapid turnover of surrounding sequences. Dotplots across the entirety of genomic DNA encoding the rodent Naip genes revealed that most of the retrotransposon integrations are not shared among orthologues/paralogues. In fact, only for orthologues such as mNaipa and rNaip1 is the TE repertoire mostly in common (Figure 2.6 and data not shown), while all other copies bear little resemblance. One interesting feature is the fact that the mNaipe/f and rNaip2 ORR1E LTR promoters have retained different LINEs at nearly corresponding positions, upstream of the transcriptional start sites (see Figure 2.2a). It is not known if these LINEs have any effect on the promoter function of the LTRs.  63  Figure 2.6. Comparison of genomic sequence surrounding the rodent Naip ORR1E LTRs. 3 kb of sequence centered around the ORR1Es was analyzed by dotplots; diagonal lines represent regions of homology between compared sequences. Light grey, dark grey, white, and black boxes represent LTR elements, SINEs, LINEs, and simple repeats, respectively.  2.3.5 Retroelement prevalence in IAP gene 5’ flanking regions The fact that human and rodent NAIP genes have independently co-opted different LTRs as promoters is extremely unusual and prompted us to ask whether the anti-apoptotic function of these genes could somehow have increased the probability of such co-option events. For example, if such genes are generally enriched for LTR elements in their 5’ flanking genomic regions compared to genes at large, the probability that LTRs would be adopted as promoters would likely increase. We therefore computed the prevalence of LTRs and other retroelements in 64  a 12.5 kb window of genomic sequence surrounding annotated TSSs of the 8 human IAP family genes (Deveraux and Reed, 1999; Liston et al., 2003): NAIP (BIRC1) and BIRC2-8. To put this result in context, we computed the distribution of LTR coverages for 1,000 sets of 8 genes chosen at random (see Materials and Methods). The same analysis was performed for 8 mouse IAP genes, a set including mNaipa, mNaipb, and Birc2-7. Importantly, we did not observe shared LTRs or other TE insertions between the different IAP family members, indicating the TE insertions were acquired independently (Table 2.S1 and data not shown). The distributions of total LTR-derived sequence coverage for the sets of randomly chosen genes (Figure 2.7a and 2.7b, solid bars) and LTR coverage for the IAP genes (Figure 2.7a and 2.7b, arrows) is indicated. The upstream 12.5 kb regions of human IAP genes are significantly enriched in LTR sequence, which comprises 9.75% of the bases.  This level of LTR coverage puts IAP genes in the 97th  percentile compared to random gene sets (Figure 2.7a). For the mouse IAP gene set, LTR sequence covered 13.8% of the bases. Only three of the 1000 random gene sets were higher in LTR coverage than this value (Figure 2.7b). In addition to LTR elements, we performed an identical analysis for other types of retroelements. In contrast to LTRs, LINEs showed no enrichment in either human or mouse in IAP upstream regions (Figure 2.7c and 2.7d).  Similar to LTRs, however, SINEs were  overrepresented in the upstream regions of human and mouse IAP genes compared to random genes (Figure 2.7e and 2.7f). We noted that mNaipa and mNaipb are particularly LTR-rich compared to the other IAP genes (Table 2.S1). Therefore, to determine if retroelement coverage for the NAIP genes in particular was unusual, we repeated the same analyses, excluding the NAIP genes from the IAP gene groups. For the human IAP gene group, the fractional coverage by each retroelement type changed little when NAIP was excluded from consideration (Figure 2.7a, 2.7c, and 2.7e). By contrast, the high LTR coverage in the upstream region of mouse IAP genes ceases to be 65  significant upon exclusion of the mouse NAIP genes themselves from the IAP gene set, although it remains above the mean (83rd percentile; Figure 2.7b).  Figure 2.7. Density of TE sequence in 5' flanking regions of IAP genes compared to random gene sets. Coverage of LTRs, LINEs, and SINEs in human (A,C,E) and mouse (B,D,F) was assessed in a 12.5 kb window surrounding database annotated transcription start sites (TSSs), 10 kb upstream and 2.5 kb downstream of the 8 human and 8 mouse IAP genes. These values, shown by solid arrows, were compared to the coverage of each type of repeat for 1,000 sets of 8 random human and 8 random mouse genes. For the human IAP genes, while SINE enrichment approaches significance (95th percentile), LTRs are significantly enriched (97th percentile), and LINEs are not overrepresented (20th percentile) within the analyzed windows. For the mouse IAP genes, both LTRs (99th percentile) and SINEs (98th percentile) are significantly enriched around IAP 5' termini, while LINEs are not (18th percentile). Dashed arrows show retroelement coverage in the same window for IAP genes when the NAIP genes themselves are removed from the analysis.  66  2.4 Discussion Here we have demonstrated that different endogenous LTRs serve as promoters of the mammalian NAIP genes. A recent study utilizing a large dataset of human and mouse TSSs generated by the cap analysis of gene expression (CAGE) approach (Carninci et al., 2006), has found that TSSs are subject to rapid evolutionary turnover and that some orthologous genes have TSSs in completely different positions (Frith et al., 2006). The NAIP genes are an example of such genes. It is also worth noting that the CAGE approach might miss some start sites provided by LTRs or other TEs due to difficulty in uniquely mapping short tags containing repetitive sequence, unless such TEs are sufficiently diverged from other copies. Indeed, the observation that the ORR1E LTR TSSs for the mNaipb gene are supported by CAGE tags (Figure 2.2b), reflects the divergence of this LTR from other copies in the genome. Thus, it is possible that a significant number of TE-derived TSSs remain to be detected. For the few mammalian genes where use of an LTR as a promoter has been demonstrated, two typical situations exist. In the first scenario, an ancient LTR present in both human and mouse serves as a promoter for the orthologous genes. An example is the carbonic anhydrase gene (CA1), where an ancestral LTR drives erythroid-specific expression of the orthologues (van de Lagemaat et al., 2003). The more commonly documented situation is where a lineage-specific LTR acts as a gene’s promoter in one species but not the other, as illustrated by the #3GALT5 gene in human (Dunn et al., 2003) and various mouse genes including Spindlin (Peaston et al., 2004). The results of this study illustrate a third evolutionary scenario not previously reported: distinct LTR elements specific to the primate or rodent lineages have independently assumed roles as promoters for the NAIP orthologues. In human, NAIP was originally cloned from a fetal brain cDNA library (Roy et al., 1995), and the 5’ and 3’ termini were subsequently resolved (Chen et al., 1998). We noticed that the 5’ terminus of this form (U19251) and the 432 bp 5’ extended form identified by Xu et al. (Xu et 67  al., 2002) in the THP1 leukemic cell line (AB048534), localized within a MER21C LTR. While unable to confirm these TSSs, we did observe a variant NAIP transcript which includes and extends upstream of the MER21C and adjacent MIR SINE (data not shown). This may simply be a result of spurious transcription, reportedly commonplace throughout the human genome (Cheng et al., 2005). Alternatively, it could point to existence of yet another NAIP promoter which could not be identified by 5’ RACE due to a size constraint or complex secondary structure. Surprisingly, through 5’ RACE we discovered that the HERV-P 3’ LTR imbedded within the MER21C element appears to be a functional promoter in testis. Earlier work identified NAIP expression in liver and placenta by Northern Blot using a coding region probe and in spinal cord and lymphoblasts following nested RT-PCR spanning coding exons (Roy et al., 1995). Our expression screens by RT-PCR of a broad panel of tissues confirmed these findings and extended them to include all tested tissues. Constitutive NAIP expression most likely initiates within the non-LTR promoter we have identified here. Quantitative RT-PCR indicated that, in normal testis, the HERV-P LTR is a significant but relatively minor NAIP promoter (Figure 2.1b). Nonetheless, the activity of this LTR promoter in testis and previous description of the MER21C LTR promoter active in a leukemic cell line (Xu et al., 2002), coupled with reports of elevated NAIP expression in myelodysplastic syndromes and leukemia (Nakagawa et al., 2005; Yamamoto et al., 2004), provides an enticing model to study potential up-regulation of these LTR promoters in certain forms of cancer, possibly through hypomethylation, since both LTRs are CpG rich (unpublished observations). In rodents, the results presented here demonstrate that the mouse and rat Naip genes employ a common ORR1E LTR as their major promoter. ORR1s and MTs are rodent-specific LTR families within the MaLR superfamily (Smit, 1993), represented by >400,000 copies in the sequenced mouse genome (Waterston et al., 2002). The fact that the ORR1E LTR is the primary promoter for these genes is unusual, considering LTRs most often function as tissue-specific or 68  alternative promoters (Dunn et al., 2003; Landry et al., 2002; Leib-Mosch et al., 2005; Medstrand et al., 2005; Peaston et al., 2004). Another intriguing finding is the fact that an MTC LTR has inserted into the mNaipe/f progenitor, and behaves as a secondary promoter. Thus, the NAIP locus represents an extremely rare case of repeated recruitment of distinct LTRs as promoters during the course of mammalian evolution. In a previous study, we found that more rapidly evolving genes or mammalian-specific genes are more likely to incorporate TEs into their UTRs, compared to genes at large (van de Lagemaat et al., 2003). NAIP represents an example of such a gene since no non-mammalian orthologue is known and its rate of protein evolution as measured by a human-rodent Ka/Ks value of 0.44 (TAED Adaptive Evolution Database (Liberles et al., 2001)) is above the median for all genes of 0.115 (Waterston et al., 2002). Ka/Ks is the normalized ratio of nonsynonomous to synonomous nucleotide substitution rates in coding sequence (Liberles et al., 2001; Waterston et al., 2002). Nonetheless, assuming roughly 20,000 orthologous genes between humans and mice and a ~0.7% frequency of human RefSeq genes employing LTR elements as promoters (van de Lagemaat et al., 2003) (unpublished observations), we predict just a single example of orthologous gene pairs having adopted lineage-specific LTR promoters by chance. Examples of the same primate locus acquiring independent Alu insertions have been reported (Ludwig et al., 2005), but we are unaware of other cases where distinct TEs provide regulatory function to orthologous genes. Remarkably, in both lineages, more than one LTR insertion contributes to NAIP promoter activity, a combination of events extremely unlikely to be due to chance alone. Several potential factors that could have contributed to this phenomenon are presented below. The first factor could be that the region upstream of this gene is subject to a lower selective constraint compared to most other genes, resulting in TE accumulation and increasing the probability that some may assume a regulatory role. Indeed, the fact that NAIP is part of the IAP gene family, with potentially overlapping or redundant functions, may have resulted in 69  increased host tolerance to regulatory change of any individual family member. Supporting this possibility is the fact that genomic coverage by LTR sequences and SINEs upstream of human and mouse IAP genes is above average (Figure 2.7). Moreover, the tandemly duplicated mouse Naip genes have a higher LTR coverage and insertion number relative to most other mouse IAP genes (Table 2.S1). These genes represent the high end of the genomic spectrum in terms of LTR and SINE density, which could indicate that their regulatory requirements are flexible and localized to small domains. Representing the opposite end of the spectrum are Hox genes and other critical transcription factor genes or developmental genes which are located in regions nearly devoid of all TEs (Lander et al., 2001; Simons et al., 2006), likely because their complex regulation requires extended regions to be free of interruptions. Interestingly, while LTR and SINE density 5’ of IAP genes is above average, LINE density is not (Figure 2.7), indicating that not all TEs have accumulated in the region. In addition, the high density of SINEs upstream of IAP genes may be related to the known role of the highly repetitive SINE sequences in facilitating genomic rearrangements (Bailey et al., 2003). The BIR domain was amplified to create the IAP family, NAIP genes have amplified variably in rodents (Growney and Dietrich, 2000), and two other IAPs, cIAP1 and 2, are tandemly duplicated copies present in primates and rodents (Young et al., 1999), implicating ongoing genomic rearrangements in IAP gene expansion. Moreover, while the IAP genes are classified as a gene family due to the shared BIR domain, mouse gene knock-out evidence suggests these proteins do not encode entirely overlapping functions. When only mNaipa is deleted, mice display poor neuronal survival under pathological conditions (Holcik et al., 2000). However, the effect of eliminating all mNaip copies remains unknown. Deletion of two other IAP family members, Survivin (Uren et al., 2000) and Bruce (Lotz et al., 2004) result in embryonic lethality. XIAP-deficient mice develop normally (Harlin et al., 2001), but recent reports indicate that it encodes a nonredundant function related to TRAIL-mediated apoptotic 70  signaling (Cummins et al., 2004). Targeting of the cIAP2 locus leads to a defective innate immune response (Conte et al., 2006). Finally, ML-IAP is over-expressed in human melanoma cells (Vucic et al., 2000) and Ts-IAP expression is testis-specific (Richter et al., 2001). These non-overlapping phenotypes indicate that some degree of selection must operate on their regulatory regions. A second potential explanation is that, compared to most genes, the 5’ flanking regions of NAIP may have been more receptive to initial retroviral or retroelement insertion, increasing the chance of LTR recruitment by this gene. Different classes of retroviruses and retroelements have distinct integration site preferences (Bushman et al., 2005). For example, human immunodeficiency virus (HIV) favors integration within active genes, murine leukemia virus (MLV) favors the 5’ ends of genes, Ty1 and Ty3 LTR retroelements of Saccharomyces cerevisiae target regions upstream of pol III-transcribed genes, and Ty5 targets heterochromatic regions (Bushman et al., 2005). An interesting recent report has documented that promoters of heat-shock genes in Drosophila are particularly prone to insertions by P elements, a very young family of DNA transposons, likely at least in part due to the unusual constitutively open chromatin associated with these genes (Walser et al., 2006). In the case under study here, since the HERV-P and MER21C elements upstream of the primate NAIP gene are members of the broad “class I” subdivision of ERVs (Jurka et al., 2005), which also includes MLV, it is possible that these ERVs also prefer 5’ flanks of genes for integration. By contrast, the rodent ORR1E and MTC LTRs of the MaLR superfamily (class III in Repbase nomenclature (Jurka et al., 2005)) are not related to any elements with known integration site preferences. Thus, we cannot speculate as to whether such elements may have originally favored regions upstream of genes. It is known that the overall genomic densities of class III elements are highest in regions further from genes, compared to other ERV classes (Medstrand et al., 2002). Furthermore, it seems  71  unlikely that the upstream region of NAIP specifically, compared to all genes, would present a favored integration target for widely different retroviral types in different species. Since it is generally assumed that the genomic distribution patterns of ancient ERVs are shaped by selection and bear little resemblance to their original integration site preferences that are unknown, a third hypothesis to account for repeated LTR co-option by NAIP is based on this gene’s function. Perhaps utilization of retroviral LTRs as promoters for NAIP is somehow advantageous to the host, resulting in their selective retention during evolution. For example, activation of NAIP via an LTR promoter may provide an avenue for germ cells to escape transitory, stress-induced apoptotic signals. LTR promoters may be particularly responsive to upregulation by cellular stresses since it has been shown that activation of human and mouse ERV LTRs can occur following stresses such as viral infection (Hampar et al., 1976; Ruprecht et al., 2006; Sutkowski et al., 2004) and UV irradiation (Frucht et al., 1991; Hohenadl et al., 1999). Various IAPs are expressed in human (Liston et al., 1996; Weikert et al., 2005), mouse (Matsumoto et al., 1999), and rat (Li et al., 1998; Wang et al., 2005) germ cells or their progenitors and it has been reported that Naip expression plays a role in mouse oocyte viability (Matsumoto et al., 1999). Although nothing is known about a potential NAIP stress response in the germ line, it has been demonstrated that NAIP mRNA and protein is upregulated in neurons following ischemic stress (Xu et al., 1997). It is also interesting that activity of the human NAIP HERV-P LTR promoter is highest in testis and, in general, ERVs are transcribed highly in germ cells and early embryogenesis compared to most normal somatic cells (Peaston et al., 2004; Taruscio and Mantovani, 2004). While there is no evidence that other IAP genes, with the exception of NAIP, use LTR promoters, the proposed up-regulation may involve gene activation by nearby LTR enhancers, offering an explanation for the fact that LTR density upstream of IAP genes as a group is high compared to random genes. Alternatively, NAIP may be unique among  72  IAP genes in recruiting LTR promoters because of its specialized functions or flexibility in regulatory control. Finally, a related, but much more speculative, hypothesis to explain LTR usage by the NAIP genes postulates that the present state reflects a viral mechanism to evade apoptosis. Infection by retroviruses can lead to induction of apoptosis (Acheampong et al., 2005; Rainey and Coffin, 2006) and HIV Nef activates caspases (Acheampong et al., 2005), the targets of IAP proteins. Waves of intracellular retrotransposition can also be associated with increased apoptosis (Haoudi et al., 2004). Therefore, retroviral/retroelement insertions in germ line cells which, by chance, induce expression of anti-apoptotic genes could abort an initial or transitory stress-induced apoptotic response, increasing the probability that cells harboring such insertions would survive and contribute to subsequent generations, assuming they have not suffered damage. In such a scenario, an LTR would only need to exert regulatory effects for a short window in time immediately after insertion, before being silenced (for example by DNA methylation), or it could continue to be used as a promoter if such activity is not detrimental to the organism, as in the case of the NAIP genes. Viruses have evolved numerous ways of circumventing host defense strategies and aborting apoptosis (Benedict et al., 2002). Indeed, one such example is the viral origin of the anti-apoptotic BIR domain, shared by all IAP genes (Liston et al., 1997). Perhaps repeated targeting of LTR elements to regulatory regions of NAIP genes represents another viral mechanism aimed at maintaining cellular viability. Nonetheless, retroviral or other TE insertions in the germ line will not be tolerated by the host species unless they are neutral, and fixed by random chance, or are advantageous. Thus, such hypothetical scenarios are tenable only if the LTR insertions do not have a detrimental impact on cell function or on organismal development. In conclusion, we have shown here that endogenous retroviral LTRs have been repeatedly co-opted to serve regulatory roles for the mammalian NAIP genes and present various 73  potential explanations to account for this phenomenon. These results document a striking example of how ancient ERV insertions can be domesticated or “exapted” (Brosius, 1999; Brosius and Gould, 1992) by the host, contributing to gene regulatory evolution.  74  1.6 References Acheampong, E.A., Parveen, Z., Muthoga, L.W., Kalayeh, M., Mukhtar, M., and Pomerantz, R.J. (2005). Human Immunodeficiency virus type 1 Nef potently induces apoptosis in primary human brain microvascular endothelial cells via the activation of caspases. J Virol 79, 4257-4269. Bailey, J.A., Liu, G., and Eichler, E.E. (2003). An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73, 823-834. Benedict, C.A., Norris, P.S., and Ware, C.F. (2002). To kill or be killed: viral evasion of apoptosis. Nat Immunol 3, 1013-1018. Brosius, J. (1999). RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238, 115-134. Brosius, J., and Gould, S.J. (1992). On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proc Natl Acad Sci U S A 89, 10706-10710. Bushman, F., Lewinski, M., Ciuffi, A., Barr, S., Leipzig, J., Hannenhalli, S., and Hoffmann, C. (2005). Genome-wide analysis of retroviral DNA integration. Nat Rev Microbiol 3, 848-858. Butler, J.E., and Kadonaga, J.T. (2002). The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev 16, 2583-2592. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. (2006). Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38, 626-635. Chen, Q., Baird, S.D., Mahadevan, M., Besner-Johnston, A., Farahani, R., Xuan, J., Kang, X., Lefebvre, C., Ikeda, J.E., Korneluk, R.G., et al. (1998). Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121-127. Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. (2005). Transcriptional maps of 10 human chromosomes at 5nucleotide resolution. Science 308, 1149-1154. Conte, D., Holcik, M., Lefebvre, C.A., Lacasse, E., Picketts, D.J., Wright, K.E., and Korneluk, R.G. (2006). Inhibitor of apoptosis protein cIAP2 is essential for lipopolysaccharide-induced macrophage survival. Mol Cell Biol 26, 699-708. Cummins, J.M., Kohli, M., Rago, C., Kinzler, K.W., Vogelstein, B., and Bunz, F. (2004). Xlinked inhibitor of apoptosis protein (XIAP) is a nonredundant modulator of tumor necrosis factor-related apoptosis-inducing ligand (TRAIL)-mediated apoptosis in human cancer cells. Cancer Res 64, 3006-3008. Deveraux, Q.L., and Reed, J.C. (1999). IAP family proteins--suppressors of apoptosis. Genes Dev 13, 239-252. Dunn, C.A., Medstrand, P., and Mager, D.L. (2003). An endogenous retroviral long terminal repeat is the dominant promoter for human beta1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci U S A 100, 12841-12846. 75  Frith, M.C., Ponjavic, J., Fredman, D., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., and Sandelin, A. (2006). Evolutionary turnover of mammalian transcription start sites. Genome Res 16, 713-722. Frucht, D.M., Lamperth, L., Vicenzi, E., Belcher, J.H., and Martin, M.A. (1991). Ultraviolet radiation increases HIV-long terminal repeat-directed expression in transgenic mice. AIDS Res Hum Retroviruses 7, 729-733. Growney, J.D., and Dietrich, W.F. (2000). High-resolution genetic and physical map of the Lgn1 interval in C57BL/6J implicates Naip2 or Naip5 in Legionella pneumophila pathogenesis. Genome Res 10, 1158-1171. Hampar, B., Aaronson, S.A., Derge, J.G., Chakrabarty, M., Showalter, S.D., and Dunn, C.Y. (1976). Activation of an endogenous mouse type C virus by ultraviolet-irradiated herpes simplex virus types 1 and 2. Proc Natl Acad Sci U S A 73, 646-650. Haoudi, A., Semmes, O.J., Mason, J.M., and Cannon, R.E. (2004). RetrotranspositionCompetent Human LINE-1 Induces Apoptosis in Cancer Cells With Intact p53. J Biomed Biotechnol 2004, 185-194. Harlin, H., Reffey, S.B., Duckett, C.S., Lindsten, T., and Thompson, C.B. (2001). Characterization of XIAP-deficient mice. Mol Cell Biol 21, 3604-3608. Hohenadl, C., Germaier, H., Walchner, M., Hagenhofer, M., Herrmann, M., Sturzl, M., Kind, P., Hehlmann, R., Erfle, V., and Leib-Mosch, C. (1999). Transcriptional activation of endogenous retroviral sequences in human epidermal keratinocytes by UVB irradiation. J Invest Dermatol 113, 587-594. Holcik, M., Thompson, C.S., Yaraghi, Z., Lefebvre, C.A., MacKenzie, A.E., and Korneluk, R.G. (2000). The hippocampal neurons of neuronal apoptosis inhibitory protein 1 (NAIP1)-deleted mice display increased vulnerability to kainic acid-induced injury. Proc Natl Acad Sci U S A 97, 2286-2290. Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., and Shoemaker, D.D. (2003). Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302, 2141-2144. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005). Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462-467. Kroger, B., and Horak, I. (1987). Isolation of novel human retrovirus-related sequences by hybridization to synthetic oligonucleotides complementary to the tRNA(Pro) primer-binding site. J Virol 61, 2071-2075. Kwan, T., Benovoy, D., Dias, C., Gurd, S., Provencher, C., Beaulieu, P., Hudson, T.J., Sladek, R., and Majewski, J. (2008). Genome-wide analysis of transcript isoform variation in humans. Nat Genet 40, 225-231.  76  Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. Landry, J.R., Rouhi, A., Medstrand, P., and Mager, D.L. (2002). The Opitz syndrome gene Mid1 is transcribed from a human endogenous retroviral promoter. Mol Biol Evol 19, 1934-1942. Leib-Mosch, C., Seifarth, W., and Schon, U. (2005). Influence of human endogenous retroviruses on cellular gene expression. In Retroviruses and primate genome evolution, E.D. Sverdlov, ed. (Austin, Texas, Landes Bioscience), pp. 123-143. Li, J., Kim, J.M., Liston, P., Li, M., Miyazaki, T., Mackenzie, A.E., Korneluk, R.G., and Tsang, B.K. (1998). Expression of inhibitor of apoptosis proteins (IAPs) in rat granulosa cells during ovarian follicular development and atresia. Endocrinology 139, 1321-1328. Liberles, D.A., Schreiber, D.R., Govindarajan, S., Chamberlin, S.G., and Benner, S.A. (2001). The adaptive evolution database (TAED). Genome Biol 2, RESEARCH0028. Liston, P., Fong, W.G., and Korneluk, R.G. (2003). The inhibitors of apoptosis: there is more to life than Bcl2. Oncogene 22, 8568-8580. Liston, P., Roy, N., Tamai, K., Lefebvre, C., Baird, S., Cherton-Horvat, G., Farahani, R., McLean, M., Ikeda, J.E., MacKenzie, A., et al. (1996). Suppression of apoptosis in mammalian cells by NAIP and a related family of IAP genes. Nature 379, 349-353. Liston, P., Young, S.S., Mackenzie, A.E., and Korneluk, R.G. (1997). Life and death decisions: the role of the IAPs in modulating programmed cell death. Apoptosis 2, 423-441. Lotz, K., Pyrowolakis, G., and Jentsch, S. (2004). BRUCE, a giant E2/E3 ubiquitin ligase and inhibitor of apoptosis protein of the trans-Golgi network, is required for normal placenta development and mouse survival. Mol Cell Biol 24, 9339-9350. Ludwig, A., Rozhdestvensky, T.S., Kuryshev, V.Y., Schmitz, J., and Brosius, J. (2005). An unusual primate locus that attracted two independent Alu insertions and facilitates their transcription. J Mol Biol 350, 200-214. Majors, J. (1990). The structure and function of retroviral long terminal repeats. Curr Top Microbiol Immunol 157, 49-92. Matsumoto, K., Nakayama, T., Sakai, H., Tanemura, K., Osuga, H., Sato, E., and Ikeda, J.E. (1999). Neuronal apoptosis inhibitory protein (NAIP) may enhance the survival of granulosa cells thus indirectly affecting oocyte survival. Mol Reprod Dev 54, 103-111. Medstrand, P., van de Lagemaat, L.N., Dunn, C.A., Landry, J.R., Svenback, D., and Mager, D.L. (2005). Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res 110, 342-352. Medstrand, P., van de Lagemaat, L.N., and Mager, D.L. (2002). Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12, 14831495. 77  Nakagawa, Y., Hasegawa, M., Kurata, M., Yamamoto, K., Abe, S., Inoue, M., Takemura, T., Hirokawa, K., Suzuki, K., and Kitagawa, M. (2005). Expression of IAP-family proteins in adult acute mixed lineage leukemia (AMLL). Am J Hematol 78, 173-180. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. (2004). Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7, 597-606. Rainey, G.J., and Coffin, J.M. (2006). Evolution of broad host range in retroviruses leads to cell death mediated by highly cytopathic variants. J Virol 80, 562-570. Richter, B.W., Mir, S.S., Eiben, L.J., Lewis, J., Reffey, S.B., Frattini, A., Tian, L., Frank, S., Youle, R.J., Nelson, D.L., et al. (2001). Molecular cloning of ILP-2, a novel member of the inhibitor of apoptosis protein family. Mol Cell Biol 21, 4292-4301. Roy, N., Mahadevan, M.S., McLean, M., Shutler, G., Yaraghi, Z., Farahani, R., Baird, S., Besner-Johnston, A., Lefebvre, C., Kang, X., et al. (1995). The gene for neuronal apoptosis inhibitory protein is partially deleted in individuals with spinal muscular atrophy. Cell 80, 167178. Ruprecht, K., Obojes, K., Wengel, V., Gronen, F., Kim, K.S., Perron, H., Schneider-Schaulies, J., and Rieckmann, P. (2006). Regulation of human endogenous retrovirus W protein expression by herpes simplex virus type 1: implications for multiple sclerosis. J Neurovirol 12, 65-71. Shin, S.W., Lee, M.Y., Kwon, G.Y., Park, J.W., Yoo, M., Kim, S.K., Oh, T.H., and Choe, B.K. (2003). Cloning and characterization of rat neuronal apoptosis inhibitory protein cDNA. Neurochem Int 42, 481-491. Simons, C., Pheasant, M., Makunin, I.V., and Mattick, J.S. (2006). Transposon-free regions in mammalian genomes. Genome Res 16, 164-172. Smit, A.F. (1993). Identification of a new, abundant superfamily of mammalian LTRtransposons. Nucleic Acids Res 21, 1863-1872. Sutkowski, N., Chen, G., Calderon, G., and Huber, B.T. (2004). Epstein-Barr virus latent membrane protein LMP-2A is sufficient for transactivation of the human endogenous retrovirus HERV-K18 superantigen. J Virol 78, 7852-7860. Taruscio, D., and Mantovani, A. (2004). Factors regulating endogenous retroviral sequences in human and mouse. Cytogenet Genome Res 105, 351-362. Uren, A.G., Wong, L., Pakusch, M., Fowler, K.J., Burrows, F.J., Vaux, D.L., and Choo, K.H. (2000). Survivin and the inner centromere protein INCENP show similar cell-cycle localization and gene knockout phenotype. Curr Biol 10, 1319-1328. van de Lagemaat, L.N., Landry, J.R., Mager, D.L., and Medstrand, P. (2003). Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet 19, 530-536.  78  Vucic, D., Stennicke, H.R., Pisabarro, M.T., Salvesen, G.S., and Dixit, V.M. (2000). ML-IAP, a novel inhibitor of apoptosis that is preferentially expressed in human melanomas. Curr Biol 10, 1359-1366. Walser, J.C., Chen, B., and Feder, M.E. (2006). Heat-shock promoters: targets for evolution by P transposable elements in Drosophila. PLoS Genet 2, e165. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476. Wang, Y., Suominen, J.S., Parvinen, M., Rivero-Muller, A., Kiiveri, S., Heikinheimo, M., Robbins, I., and Toppari, J. (2005). The regulated expression of c-IAP1 and c-IAP2 during the rat seminiferous epithelial cycle plays a role in the protection of germ cells from Fas-mediated apoptosis. Mol Cell Endocrinol 245, 111-120. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562. Weikert, S., Schrader, M., Christoph, F., Schulze, W., Krause, H., Muller, M., and Miller, K. (2005). Quantification of survivin mRNA in testes of infertile patients and in testicular germ cell tumours: high levels of expression associated with normal spermatogenesis. Int J Androl 28, 224-229. Wilhelm, B.T., Landry, J.R., Takei, F., and Mager, D.L. (2003). Transcriptional control of murine CD94 gene: differential usage of dual promoters by lymphoid cell types. J Immunol 171, 4219-4226. Xu, D.G., Crocker, S.J., Doucet, J.P., St-Jean, M., Tamai, K., Hakim, A.M., Ikeda, J.E., Liston, P., Thompson, C.S., Korneluk, R.G., et al. (1997). Elevation of neuronal expression of NAIP reduces ischemic damage in the rat hippocampus. Nat Med 3, 997-1004. Xu, M., Okada, T., Sakai, H., Miyamoto, N., Yanagisawa, Y., MacKenzie, A.E., Hadano, S., and Ikeda, J.E. (2002). Functional human NAIP promoter transcription regulatory elements for the NAIP and PsiNAIP genes. Biochim Biophys Acta 1574, 35-50. Yamamoto, K., Abe, S., Nakagawa, Y., Suzuki, K., Hasegawa, M., Inoue, M., Kurata, M., Hirokawa, K., and Kitagawa, M. (2004). Expression of IAP family proteins in myelodysplastic syndromes transforming to overt leukemia. Leuk Res 28, 1203-1211. Yi, J.M., Schuebel, K., and Kim, H.S. (2007). Molecular genetic analyses of human endogenous retroviral elements belonging to the HERV-P family in primates, human tissues, and cancer cells. Genomics 89, 1-9. Young, S.S., Liston, P., Xuan, J.Y., McRoberts, C., Lefebvre, C.A., and Korneluk, R.G. (1999). Genomic organization and physical map of the human inhibitors of apoptosis: HIAP1 and HIAP2. Mamm Genome 10, 44-48.  79  CHAPTER 3 A NOVEL PROTEIN ISOFORM OF THE MULTICOPY NAIP GENE DERIVES FROM INTRAGENIC ALU SINE PROMOTERS1  _______________________________________ 1 A version of this chapter has been accepted for publication. Romanish, M.T., Nakamura, H., Lai, C.B., Wang, Y.Z., Mager, D.L. Novel protein isoforms of the multicopy human NAIP gene derive from intragenic Alu SINE promoters. PLoS ONE, In press. HN provided instruction for Western blot experiments. CBL confirmed utility of an AP-1 binding site of one NAIP-associated Alu. YZW provided reagents. 80  3.1 Introduction Improved assembly of the region encoded by human chromosome 5q13.2 has clearly revealed that NAIP and surrounding genes experience copy number variation (Schmutz et al., 2004). While little, or no, expression data is specifically available for the NAIP copies other than NAIPfull, they are presumed to be pseudogenes since they are deleted for either the 5’ portions that harbour the known regulatory regions (Chapter 2) (Romanish et al., 2007; Xu et al., 2002) or for most coding exons. However, in this chapter I provide evidence that some of the other NAIP copies are also transcribed, albeit from the most unlikely of sources. I have shown in Chapter 2, that neuronal apoptosis inhibitory protein (NAIP) orthologues in human and mouse provide a remarkable example of LTR promoter exaptation – unrelated LTRs were independently acquired as gene promoters (Romanish et al., 2007). In this study the flexibility associated with NAIP regulation in human is further demonstrated, by showing that 5’ truncated transcripts arise from two unique Alu SINEs. Indeed, Alu elements are not regarded as a significant source of regulatory sequences for the transcription of cellular genes, due to their dependence on RNA pol III. Intriguingly, I also show that the open reading frame resulting from Alu-mediated NAIP transcription yields a novel protein isoform that only encodes the signature NLR domains. This additional research further substantiates the view that TEs are a readily available source of transcriptional regulatory signals and highlights a potentially important role for Alu elements in mediating the transcription of cellular genes.  81  3.2 Materials and methods  3.2.1 Ethics statement The blood sample was obtained with written informed consent according to a protocol approved by the University of British Columbia Research Ethics Board.  3.2.2 RNA and reverse transcription With the exception of blood, all human RNA was purchased from Clontech (Mountain View); each sample consists of pooled material from multiple individuals. Blood was obtained from a healthy human adult with informed consent and the sample subsequently underwent erythrocyte reduction. RNA from remaining peripheral blood leukocytes (PBLs) was isolated using the QIAmp RNA Blood Mini Kit (Qiagen). Where necessary, RNA was isolated from candidate cell lines using TRIzol (Invitrogen) according to the manufacturer’s recommendations. Prior to reverse transcription, RNA was quantified using a Qubit fluorometer (Invitrogen). All cDNA synthesis was prepared by random hexamer-primed Superscript III Reverse Transcriptase (Invitrogen), as directed by the manufacturer.  3.2.3 RT-PCR All RT-PCR, except as indicated below for amplification of the NAIP ORF and generation of the expression vector, was performed with Platinum Taq DNA Polymerase (Invitrogen) and the relevant primers are listed in Table B.1, all used at 10 !M. Optimal primer annealing temperatures were deduced using the temperature gradient function of an iCycler (BioRad) over 35 cycles. Subsequent experiments were carried out at the optimal Tm for each primer set in a GeneAmp PCR System 9600 (Applied Biosystems). Discrimination of 5’ vs 3’ NAIP transcript levels was carried out at 30 cycles. The full-length NAIP ORF deriving from the Alu 82  SINEs was obtained by amplification with Phusion High Fidelity DNA Polymerase (Finnzymes). As expected, primers within Alu SINEs yielded a multitude of products and were subsequently resolved by Southern blotting. Probe was generated with radiolabeled dCTP32 using the random primer labeling kit (Invitrogen) as directed. Pre-hybridization, hybridization, and washes of Zetaprobe GT membranes (BioRad) were performed using ExpressHyb (Clontech) according to manufacturer’s specifications. Exposure of BioMax Film (Kodak) for one hour or less was sufficient to adequately differentiate true bands from background.  3.2.4 5’ Rapid Amplification of cDNA Ends (5’ RACE). Using the First-choice RLM RACE Kit (Ambion) the 5’ termini of human NAIP were deduced as before (Romanish et al., 2007). We revised our initial approach (Romanish et al., 2007) by designing gene-specific reverse primers to a downstream exon, common to all predicted NAIP copies (primers listed in Table B.1); previously primers could only surmise expression of NAIPfull. Subtle variations in RT-PCR product size was observed across a range of Tms (55°-60°) – since the full complement of NAIP start sites was being queried – therefore, all unique bands were purified using the QIAquick Gel Extraction Kit (Qiagen) and cloned into the pGEM-T vector (Promega) prior to sequencing (McGill University and Génome Québec Innovation Centre). Importantly, consistent amplification patterns were observed within a given Tm. We similarly tested mouse kidney RNA; although we identified novel intraexonic start sites for mNaip2, qRT-PCR only showed a slight increase (1.2:1) of 3’ over 5’ ends (data not shown).  3.2.5 Quantitative RT-PCR The cDNA used for quantitative RT-PCR with Power SYBR Green PCR Master Mix (Applied Biosystems) in the ABI 7500 Real Time PCR System (Applied Biosystems) was prepared as above. Primers (10 !M) were determined to amplify equally efficiently across a 83  broad range of template dilutions by standard curve (listed in Table B.1). The comparative CT method was used to quantify targets; CT values were normalized to "-actin levels in each tissue and expressed relative to the indicated target in the indicated tissues. Experiments were conducted at least four times for each primer set, with cycling parameters as follow: 50 °C, 2 min; 95 °C, 10 min; [95 °C, 15 s; 60 °C, 1 min] X 40 cycles. For initial experiments, where primer efficiencies were being determined, dissociation curves and –RT controls were included, indicating the specificity of amplification and lack of DNA contamination in template preparations, respectively (data not shown). Alternative splicing variants posed a problem in primer design for the NAIPERV-P and NAIPSg targets. For NAIPERV-P we quantified only one of the variants and estimated that it accounted for ~40% of all total LTR-derived transcripts, as before (Romanish et al., 2007). For NAIPSg, we designed primers spanning exon junctions of both isoforms and combined their proportions.  3.2.6 Generation of constructs Placental genomic DNA was obtained from the laboratory of Dr. P. Medstrand (Lund University) and subsequently used to PCR amplify the NAIP promoter regions and open reading frame (ORF). Promoter constructs. Testis-specific LTR (or NAIPERV-P), the ubiquitous NAIPfull, and the Alu-derived NAIPSg and NAIPJb promoters were amplified by PCR using Phusion High Fidelity DNA Polymerase (Finnzymes) in an iCycler (BioRad) over 35 cycles, the primers used are listed in Table B.1. The respective products are approximately 500 bp and centered on the transcription start sites. All primers possessed BglII and HindII recognition sites to facilitate directional cloning into a modified pGL3B vector described elsewhere (Romanish et al., 2007). Sequencing (McGill University and Génome Québec Innovation Centre) verified fidelity of amplified fragments. Expression vector. The preserved ORF deriving from NAIPSg and NAIPJb transcripts was amplified by Phusion High Fidelity DNA Polymerase (Finnzymes) from human 84  testis cDNA (as described above) over 35 cycles, primer sequences are indicated in Table B.1. The desired amplicon was isolated using the PureLink Quick Gel Extraction Kit (Invitrogen) and subsequently dATP-tailed with Taq DNA Polymerase (Invitrogen) to facilitate cloning into the pGEM-T vector (Promega). Sequencing not only confirmed that the ORF was cloned error-free, but also that NAIP2 is expressed, in addition to NAIPfull, on account of a single representative nucleotide difference. Xho1 and Nco1 recognition sites incorporated into primers were utilized to subclone the sequenced ORF into the CTV 211 hemagglutinin (HA) epitope-bearing mammalian expression vector, generously provided by Dr. R. Kay (Terry Fox Laboratory). All vectors were amplified in E. coli DH5# and purified using the Nucleobond AX (Clontech) maxi prep kit, and quantified using the Qubit fluorometer (Invitrogen).  3.2.7 Cell culture and transient transfection HeLa, NTera2D1, LNCaP, and Jeg3 cells were cultured in DMEM (Stem Cell Technologies) and PC3 cells in RPMI 1640 (Stem Cell Technologies), and incubated at 37° and 5% CO2. All media formulations were supplemented with 10% Fetal Bovine Serum (Invitrogen) and maintained in penicillin/streptomycin, except when undergoing transfection experiments. Prior to transfection of promoter constructs cells were seeded at 105 cells/well, or 2 X 105 cells/well for NTera2D1, in a 24-well dish overnight. Lipofectamine 2000 (Invitrogen) was used to transfect the indicated cells with the indicated vectors according to manufacturer’s specifications. Approximately 6-8 hours post-transfection cells were washed with PBS (Stem Cell Technologies) and fresh complete media was added to allow for production of the reporter for an additional ~24 hours. The HA:NAIP expression vector, was transiently transfected into HeLa, PC3, and NTera2D1 cells using Metafectene (Biontex) as recommended by the manufacturer.  85  3.2.8 Reporter gene assays Prior to lysis, cells were washed with PBS, processed, then analyzed for firefly and Renilla luciferase activity using the Dual Luciferase Reporter Assay System (Promega) as indicated by the manufacturer. All values were standardized to the Renilla luciferase internal control to normalize for transfection efficiency, then expressed relative to the modified promoterless pGL3-Basic vector.  3.2.9 Western blotting Cells were grown in 10 cm dishes as indicated above. The human PC3, NTera2D1, and HeLa cell lines were selected to screen for NAIP proteins based on preliminary RT-PCR findings (data not shown). Cells transfected with the expression vector encoding the Alu-derived NAIP ORF or untransfected controls were harvested by either scraping or trypsinization following two washes with cold PBS. Cell pellets were obtained by centrifugation and resuspended in RIPA (150 mM NaCl; 1% NP-40; 0.5% sodium deoxycholate; 0.1% SDS; 50 mM Tris, pH8) and NP40 (150 mM NaCl; 1% NP-40; 50 mM Tris, pH8) lysis buffers supplemented with a protease inhibitor cocktail (Roche), and subsequently quantified using the Qubit Fluorometer (Invitrogen). Hemagglutinin epitope signal was easier to detect in NP40 lysates, while RIPA provided clearer results for the NAIP-specific antibody. Bi-phased gels containing TEMED and APS (4% stacking, 9% separating) were used to resolve total cellular protein in electrophoresis running buffer (10X: 25 mM Tris; 192 mM glycine; 0.1% SDS). Subsequently, separated proteins were transferred using a Hoefer TE 22 tank transfer unit (Amersham Biosciences) onto Immobilon-P PVDF membrane (Millipore) in fresh transfer buffer (25 mM Tris, 192 mM glycine, 10% methanol, 0.1% SDS). To assess NAIP protein isoforms in primary human tissues an IMB-10350 Instablot membrane was purchased from Imgenex (San Diego). Blocking of all membranes was performed in 5% reconstituted skim milk powder under constant agitation at 4° overnight. 86  The following morning, blocking solution was replaced and fresh primary antibodies were applied at 1:1000 NAIP (Abcam), 1:3500 Actin (Sigma), and 1:3500 HA (BAbCO) for one hour at room temperature under constant agitation. Washes were carried out with TBS-T (10X: 20 mM Tris; 1.4 M NaCl; 1% Tween-20) at room temperature in 5 minute intervals, no more than five times. Secondary antibody was diluted in fresh TBST and 1% blocking solution to a final concentration of 1:100 000, and incubated for one hour at room temperature under constant agitation. Washes were conducted as above. Proteins were detected using the Enhanced Chemiluminescence Kit (Perkin Elmer) and Kodak BioMax Film and cassettes (Kodak). Where necessary the Instablot was stripped with 0.2M NaOH, all other membranes were cleared by an acidic strip solution (25 mM glycine-HCl pH2, 1% SDS).  3.2.10 Computational tools Dot plots. Analysis of the underlying DNA sequence of 5q13.3 was performed to better understand the exons mapping to particular NAIP copies. DNA sequences were obtained from the UCSC Human Genome Browser March 2006 (hg18) assembly (Kent et al., 2002). The genomic sequence of NAIPfull (chr5:70,298,269-70,360,000) was used to assess exon architecture of the remaining copies: NAIP1 (chr5:70,425,120-70,469,539); NAIP2 (chr5:69,424,00969,495,811); and !NAIP1 and 2 (chr5:69,780,634-69,828,298; 68,921,612-68,967,595). Indicated  sequences  were  compared  using  the  web-based  jdotter  (http://athena.bioc.uvic.ca/workebnch.php?tool+jdotter&db=). Sequence Analysis. Sequenced clones were uploaded, managed, and analyzed in the SDSC Biology Workbench (http://workbench.sdsc.edu). Precise mapping of the clones to the human genome was completed using the BLAT tool in the UCSC Genome Browser (Kent et al., 2002). ORF prediction. Sequences of interest were scanned for open reading frames using NCBI’s ORF Finder, and subsequent analysis of encoded domains was completed with BLASTP. 87  3.3 Results  3.3.1 Human NAIP is a multicopy gene Copy number variation (CNV) exists in the region of human chromosome 5q13.2 encoding NAIP and other genes (Chen et al., 1998; Schmutz et al., 2004; Tran et al., 2008), as it does among inbred mouse strains (Diez et al., 2003). In the reference human genome at least five copies are annotated (Kent et al., 2002) (Figure 3.1a), and while only one of these is full length, NAIPfull, the others are assumed to be pseudogenes since two are 5’- and two are 3’-deleted, NAIP1 & 2 and "NAIP1 & 2, respectively (Figure 3.1a, b). Exon content of the NAIP paralogues was verified using dot plots (Figure B.1). While assessing their transcription using a variety of RT-PCR primers sets, we found that 3’ transcript levels of NAIP are greater than 5’ transcript levels in most tissues. In general, NAIP 5’ and 3’ transcripts showed the smallest differences in the macrophage-rich lung, spleen (Figure 3.1c), and blood (Figure B.2). Expression of NAIP in these tissues most likely results from macrophage infiltration (Maier et al., 2007), the cell type mediating NAIP-dependent L. pneumophila immunity. The largest difference is observed in testis where 3’ levels are >40-fold above 5’ levels. Interestingly, in liver 5’ levels of NAIP are the highest (Figure 3.1c), potentially arising from transcription of 3’ deleted isoforms, premature polyadenylation, or CNV-associated anomaly within the tissue sample screened. The abundance of 3’ transcripts raises the possibility that the 5’ deleted copies, NAIP1 and NAIP2, are expressed (Figure 3.1c, Figure B.2), or that internal promoters of NAIPfull produce transcripts lacking the 5’ end, or both.  88  Figure 3.1. Expression of predicted NAIP copies in the sequenced human genome. A) General landscape of chromosome 5q13.2, including the NAIP (black arrows), GUSBP1 (grey arrows), and surrounding genes (white arrows). B) Exon architecture of the annotated NAIP copies, verified by dot plots (Figure B.1). Slanted lines delimit deletions relative to NAIPfull. Diagrams are not drawn to scale. C) qRT-PCR with primers indicated by small arrowheads in panel B to determine the overall levels of NAIP 5’ (light bars) vs 3’ (dark bars) transcription. Values are normalized to #-actin levels in each tissue, and shown relative to kidney 5’. Each bar represents the mean of at least five independent experiments $ SD.  3.3.2 Novel human NAIP transcription start sites The observation that levels of 5’ vs. 3’ transcription are not uniform across various human tissues prompted an analysis to determine where NAIP transcription was initiating. Previously, we showed that an upstream ERV-P LTR is a promoter of NAIPfull specifically in testis, but that ubiquitous expression derives from within an exon in the 5’ UTR (Romanish et al., 2007). Moreover, a previously published transcription start site (Xu et al., 2002), overlaps a MER21C LTR slightly upstream of the ERV-P, but could not be confirmed by 5’ RACE. 89  However, an RT-PCR approach using tiled primers, similar to that of Xu et al. (Xu et al., 2002), indicated that an adjacent AluSx SINE was also included in these transcripts (Figure B.3). We are unable to conclude whether this SINE is in fact a site of NAIP transcription or an internal exon of an undescribed 5’ UTR. Here we revised our previous 5’ RACE approach, which only assessed the transcription start sites (TSS) associated with expression of NAIPfull (Romanish et al., 2007), and numerous novel TSS were discovered (Figure 3.2). Unexpectedly, we observed that two Alu SINEs located 5’ of exon 10, an AluSg and AluJb, are sites of NAIP transcriptional initiation, hereon referred to as NAIPSg and NAIPJb (Figure 3.2a). These Alus are in the antisense orientation, full-length (~300 bp) and present in NAIP orthologues of New and Old World primates (data not shown). Since sequence identity hinders their unambiguous mapping, NAIPSg and NAIPJb 5’ RACE clones could arise from three of the five copies (NAIPfull, NAIP1, and NAIP2) in the reference human genome (Figure B.4). Thus, either NAIP1 and/or NAIP2 are expressed from Alus, or these Alus may serve as promoters within NAIPfull, or both. NAIPSg clones were obtained that mapped to two distinct TSS localizing in the 3’ terminus of the Alu, within its A-rich tail (Figure B.4a). Interestingly, the AluSg A-rich tail is known to be hypermutable (Arcot et al., 1995; Economou et al., 1990); however, the corresponding region of this particular element is identical to its consensus sequence. The upstream ~9 kb (relative to NAIPSg polarity) is a patchwork of LINE fragments and Alus, and likely contributes additional regulatory signals. All NAIPSg clones splice into the adjacent exon 8 (Figure 3.2a, Figure B.4a), utilizing a splice donor site frequently employed by exonized antisense Alus (Makalowski et al., 1994; Sorek et al., 2002). Several NAIPJb clones were also obtained and map to two particular regions localized near the AluJb 5’ terminus (Figure B.4b). The regulatory signals comprising the NAIPJb core promoter, therefore, are expected to lie within the body of this Alu. The NAIPJb clones, however, do not splice into the downstream exon 10; 90  rather, transcription continues through the intervening ‘intron’. The validity of NAIPJb transcripts is verified by +/- RT controls (Figure B.5). Interestingly, the splice donor sequence utilized by NAIPSg has undergone an AG % AT transversion mutation in NAIPJb (Figure B.4b); its capacity for splicing has not been studied here. Additional TSS downstream of NAIPJb, in the intervening sequence adjacent to exon 10, are also observed (Figure B.4b).  Figure 3.2a. Identification of novel NAIP transcription start sites.  Diagram of transcription start sites identified in the NAIPfull (top) and NAIP1/2 (bottom) copies by 5’ RACE. In the center, shaded block arrows indicate polarity of genes encoded on 5q13.2 (as in Figure 3.1a) and enlargements of NAIPfull and NAIP1/2 are shown above and beneath this representation. Their orientation is shown opposite to which they are encoded and black boxes represent exons. Checkered and striped block arrows indicate localization and orientation of Alus and the previously identified NAIP LTR promoters (Romanish et al., 2007), respectively. Not all repeat elements are shown. Black double arrowheads represent primers used in nested RT-PCR to uncover NAIP TSS in this and a previous analysis (Romanish et al., 2007), represented by stick diagrams in top- and bottom-most images. All sequenced clones arising from Alus, and neighbouring TSS, map with perfect identity to NAIPfull, NAIP1, and NAIP2. Gene diagrams are not drawn to scale.  91  Another site of transcription initiation was identified within the final intron of the GUSBP1 gene (Figure 3.2a). Although sequence identity hinders unambiguous mapping of this transcript, the novel first exon splices into exon 4 of the adjacent NAIP1 and/or NAIP2. Consequently, expression of at least one other NAIP copy, in addition to NAIPfull, is demonstrated since a TSS within the final intron of the GUSBP1 gene is only adjacent to NAIP1 and NAIP2.  3.3.3 Promoter activity of proximal NAIPSg and NAIPJb sequences Particularly intrigued by the Alu TSS, we tested the capacity of the underlying sequences as pol II promoters in reporter gene assays, relative to the 5’ promoters we previously identified (Romanish et al., 2007). Indeed, the ubiquitous NAIPfull and LTR-derived, testis-specific NAIPERV-P are capable promoters in the NTera2D1, HeLa (Figure 3.2b), and Jeg3 (data not shown) cell lines. A >500 bp DNA fragment underlying the NAIPJb TSS, including the ~200 bp of upstream Alu sequence and extending 5’ toward exon 10, exhibits strong promoter activity (Figure 3.2b). Similarly, a 600 bp fragment centered on the NAIPSg TSS, containing the entire AluSg and the upstream 300 bp of internal L1 sequence, also exhibits considerable promoter activity relative to an empty vector control, in fact comparable to the LTR (Figure 3.2b). Due to the location of the AluSg TSS, the upstream L1 fragment likely contributes promoter regulatory motifs, but its position relative to a full-length L1 does not correspond to the previously described antisense L1 promoter (Nigumann et al., 2002). Analysis of the nucleotide sequences underlying the NAIPSg and NAIPJb TSS revealed the incidence of several putative pol II regulatory motifs, including: TATA-like boxes, initiator sequences, and downstream promoter elements (Figure B.4) (Butler and Kadonaga, 2002). Accumulating evidence indicates that numerous pol II transcription factor binding sites lie within Alu elements (Shankar et al., 2004; Tomilin, 2008). Indeed, both NAIP-associated Alus possess potential AP-1 and retinoic acid- and 92  estrogen response element binding motifs (Figure B.4a and B.4b), in agreement with published consensus sequences (Shankar et al., 2004).  Figure 3.2b. Identification of novel NAIP transcription start sites. Novel regulatory regions associated with NAIP transcription. Luciferase assays were performed using reporter constructs centered on the previously identified ERV-P and NAIPfull, and the NAIPSg and NAIPJb TSS identified here (indicated by bent arrows). The fragments tested are denoted by solid bars beneath the magnified NAIPfull image (top), and are labeled accordingly. Exons, Alus, and LTR elements are indicated as in Figure 3.2a; here, LINE fragments are indicated as speckled arrows. Values are normalized to an internal control (Renilla luciferase) and expressed relative to a promoter-less control vector (pGL3-Basic). Each bar represents the mean of at least four independent experiments $ SD. Gene diagrams are not drawn to scale.  3.3.4 Variable contribution of Alu-associated NAIP transcripts in different tissues To address the contribution of Alu-derived NAIP transcripts to total NAIP expression, qRT-PCR was performed. Although their transcription is detected in most tissues screened by RT-PCR (Supplementary Figure 3.S5), this approach indicates NAIPJb is expressed at levels similar to or higher compared to NAIPfull in many of the tissues tested, and is therefore likely an important promoter (Figure 3.3). In contrast, NAIPSg does not contribute significantly to total NAIP expression in any tissue tested (Figure 3.3). Interestingly, scrutiny of 5’ RACE sequences revealed that NAIPSg undergoes A%G RNA editing in its 5’ UTR (Supplementary Figure 3.S4a), a common observation among transcribed Alus (Kim et al., 2004; Lev-Maor et al., 2008). 93  Comparison of edited vs. un-edited NAIPSg transcript levels indicated the former is >10-fold more abundant than the latter (data not shown).  Figure 3.3. Contribution of Alu-initiated isoforms to total NAIP transcription.  Expression levels of the targets: NAIPTotal (3’), NAIPfull (5’), NAIPJb, and NAIPSg were normalized to "-actin and are shown relative to 3’ levels of NAIP transcription in the indicated tissues. Each bar represents the mean of at least five independent experiments $ SD.  Most NAIP transcription in colon, spleen, lung, and prostate could be accounted for by the combined activity of all queried promoters, but the contribution of individual paralogues could not be assessed due to their high sequence identity. However, in kidney and testis all isoforms are not detected and it is likely that unaccounted 3’ transcription either initiates downstream of AluJb, as indicated above (Figure B.4b), or from the NAIPGUSBP1 TSS. Contribution of NAIPGUSBP1-derived transcripts could not be assessed due to the complexity of alternative splicing in this 5’ UTR (Figure B.5). As discussed previously, the 5’ levels of NAIP in liver are expressed 4-fold over 3’ levels, indicating that all transcription in this tissue derives from NAIPfull. Since two independent liver RNA samples were screened, this rules out the possibility of patient-specific CNV, unless both samples derive from the same patient. Perhaps transcription in liver produces isoforms that constitutively omit one or both exons to which our 94  3’ qRT-PCR primer sets are designed. Alternatively, NAIPfull transcripts in this tissue could be aberrantly polyadenylated. Regardless, neither NAIPSg nor NAIPJb are highly expressed in liver.  3.3.5 Full-length Alu-derived transcripts are broadly expressed The fact that the AluJb functions as a pol II promoter is an intriguing finding, with genome-wide ramifications in establishment of transcriptional networks, as previously suggested (Feschotte, 2008; Shankar et al., 2004). We next examined the potential for transcription of a novel NAIP ORF as a result of Alu promoter activity. Indeed, if all downstream exons are included in at least some Alu-derived NAIP transcripts, a 2,643 nucleotide ORF is preserved (Figure B.6). Therefore, we sought to determine whether Alu-initiated transcripts continue to the 3’ terminus, by RT-PCR. Southern blotting, using unique probes, was required since, by necessity, primers hybridized to Alus – the most plentiful elements in primate genomes (Lander et al., 2001). Across all tissues screened, except liver, products corresponding to the expected size (~3 kb) were resolved for NAIPJb (Figure 3.4). Among various minor forms, one notable variant of ~2 kb is expressed at the same frequency as full-length NAIPJb. This ~2 kb variant, among numerous others including full-length, is also observed for NAIPSg transcripts in several tissues (data not shown). Potentially the smaller isoform could result from alternative splicing common to both NAIPJb and NAIPSg transcripts, between the site of reverse primer binding and probe hybridization. Alternatively, a single NAIP transcript possessing a second exonized Alu downstream of some or all of the probe-binding region could also explain this observation. The prominent ~3 and ~2 kb bands do not result from the simultaneous amplification of NAIPJb and NAIPSg due to primer cross-reactivity, since the respective transcripts and their unique 5’ UTRs are roughly equal in size. Nonetheless, existence of full-length Alu-derived transcripts, a potential 2,643 nucleotide ORF, and numerous in-frame ATGs in accordance with derived  95  consensus sequences (Kozak, 1987; Nakagawa et al., 2008) (Figure B.6) point to a potential for the synthesis of NAIP protein isoforms.  Figure 3.4. Expression of full-length NAIPJb transcripts across many tissues. At top, a schematic diagram of the 3’ terminus of NAIP is shown, not to scale. Exons are indicated by black boxes, checkered and spotted arrows indicate the polarity of SINEs and LINEs, respectively. Not all repeat elements are shown. The arrowheads represent primers used to assess full-length NAIP transcription. Due to the high copy number of Alus in the human genome, the resultant RT-PCR gels were resolved by Southern blotting, with the unique probe shown, across the indicated tissues to reveal true AluJb-derived NAIP transcripts.  3.3.6 Novel human NAIP protein isoforms Using the annotated copies of NAIP in the sequenced human genome as a reference (Kent et al., 2002), we scanned all possible full-length transcripts that could arise from the novel TSS reported above for ORFs and domain composition. Many potential ORFs were identified for each queried transcript, but only the longest examples were considered. Interestingly, all accepted examples represented N-terminal truncations of NAIPfull, indicating the existence of numerous potentially functional in-frame translation initiation codons (Figure 3.5a, Figure B.6). NAIPfull was previously shown to comprise 1403 amino acids and yield a ~160 kDa protein encoding three N-terminal anti-apoptotic Baculoviral IAP Repeat (BIR) domains, followed by a central nucleotide binding domain (NBD) and C-terminal leucine-rich repeats (LRR) (Roy et al., 1995). NAIPSg- and NAIPJb-mediated transcription of NAIP2 is predicted to generate an ORF 881 96  amino acid long, and corresponds to a 110 kDa protein that excludes the BIRs (NAIPAlu). Due to the deletion of exons 12-14 in NAIP1 a C-terminal truncation of the LRRs is also predicted, in addition to a truncation of its N terminus (Figure 3.1b), and could produce a ~85 kDa NAIP protein isoform, but was not detected. Finally, transcription from the promoter within the final GUSBP1 intron can drive expression of both NAIP1 and NAIP2, and potentially gives rise to 100 kDa (NAIP1) and 130 kDa (NAIP2) proteins, respectively. Both putative protein isoforms, NAIP1 and NAIP2, possess one N-terminal BIR domain, followed by the central NBD, but only NAIP2 harbours C-terminal LRRs. Indeed, Western blots on human PC3, HeLa, and NTera2D1 cell lysates indicate the presence of multiple bands corresponding to the above computer predictions (Figure 3.5b). To more accurately assess the potential for translation of the Aluderived NAIP2 ORF we generated a NAIP:hemagglutinin fusion protein (HA:NAIPAlu) and overexpressed it in the cell lines indicated above. The recombinant protein HA:NAIPAlu is translated and migrates at 110 kDa with the putative endogenous isoform (NAIPAlu) in untransfected PC3 and HeLa cells (Figure 3.5b). It is clear that the NAIP protein isoforms are differentially expressed in the queried cell lines, but all three cell lines endogenously produce the ~160 kDa NAIPfull and ~110 kDa NAIPAlu proteins, albeit to a different degree. In the PC3 and HeLa cell lines, where HA:NAIPAlu was overexpressed, an increase in band intensity is seen compared to NAIPAlu in untransfected cells. Overall, expression of the putative NAIPAlu protein is low relative to NAIPfull in all cell lines; however, the difference is not as exaggerated in NTera2D1 cells compared to PC3 or HeLa. Finally, it appears that neither NTera2D1 nor HeLa cells express the putative ~130 kDa NAIP2 protein isoform.  97  Figure 3.5. Detection of novel NAIP protein isoforms.  A) Diagrams of NAIPfull (top) and NAIP1/2 (bottom) are shown; speckled exons 12-14 are only encoded by NAIP2 in the reference human genome. The known NAIP TSS are indicated by bent arrows, and computational translation predicts the domain composition and mass of the resulting ORFs: NAIPfull, NAIPAlu, NAIP1/2. NAIP1 is predicted to encode a ~100 kDa protein, and NAIP2 is ~130 kDa. The BIRs (Baculoviral IAP Repeat); NBD (Nucleotide binding domain) and LRR (Leucine-rich repeat) domains are indicated by circles, cylinders, and triangles respectively. B) Western blot of NAIP in PC3, HeLa, and NTera2D1. Endogenous expression of NAIPfull, NAIP2, NAIPAlu, and NAIP1 (top) and HA-tagged NAIPAlu (bottom) is shown in transfected and untransfected cells.  3.3.7 NAIP protein isoforms are broadly expressed in human tissues The observation that NAIP proteins equivalent in size to all of the computer-predicted isoforms are expressed in the cell lines screened, prompted a similar investigation of primary human tissues (Figure 3.6). A variety of NAIP proteins as detected in most of the tissues examined, although NAIPfull is not broadly expressed. In fact, NAIPfull was only detected in heart, skeletal muscle, and at very low levels in testis. Similarly, the ~110 kDa protein, which is expected to represent the Alu-derived NAIP ORF, is also only detected in heart and skeletal 98  muscle. Potential NAIP2 proteins at ~130 kDa are observed almost uniformly across the tissues tested, and could correspond to NAIPGUSBP1-initiated transcripts. The subtle variation of the putative NAIP2 proteins, such as in spleen and heart, could result either from alternative start codon selection (Figure B.6) or alternative splicing of NAIP2 terminal exons. Importantly, all of the tissues screened here, other than testis, derive from one individual with unknown NAIP copy number and mRNA expression levels. Nonetheless, we demonstrate the expression of various human NAIP protein isoforms that correspond with calculated molecular masses of the ORFs generated by alternative promoter usage.  Figure 3.6. Expression of NAIP protein isoforms in primary human tissues. Western blot analysis of a commercial, pre-transferred membrane with human proteins deriving from the tissues of one adult female, with the exception of testis. NAIP expression is shown at top, and actin levels at bottom. Mass of bands is indicated at left.  99  3.4 Discussion Transposable elements were initially discovered as important factors in the regulation of gene expression in maize, and termed controlling units (McClintock, 1953). This view of TE usefulness was contrasted by the ‘junk DNA’ hypothesis (Doolittle and Sapienza, 1980). In recent times their potential function(s) has garnered increased attention, particularly as mobile regulatory modules (Brosius, 1999; Feschotte, 2008; Hasler et al., 2007; Shankar et al., 2004). Strikingly, TEs are associated with many evolutionarily constrained regions in mammalian genomes (Lowe et al., 2007), and many conserved non-coding elements are reported to function as transcriptional enhancers (Pennacchio et al., 2006). In general, it is difficult to ascertain the extent to which TEs donate their embedded regulatory signals to cellular genes, particularly because they can impose their effects over great distances. However, bioinformatics analyses of human and mouse genomes indicate a substantial impact of TEs on cellular gene regulation; as many as 25% of genes possess TEs in their UTRs (Jordan et al., 2003; van de Lagemaat et al., 2003). Therefore, their influence on increasing the diversity of mammalian transcriptomes is likely underappreciated. The LTRs and LINEs, due to the natural presence of RNA pol II signals, are likely candidates to fulfill a regulatory role for cellular genes; dozens of known cases confirm their utility as regulatory modules (Brosius, 1999; Nigumann et al., 2002; van de Lagemaat et al., 2003). In contrast, the pol III-dependent SINEs are concentrated in gene dense regions (Lander et al., 2001; Medstrand et al., 2002), but have largely been neglected as modulators of cellular gene expression. Recent bioinformatics analyses, however, have revealed the presence of numerous RNA pol II transcription factor binding sites and hormone response elements within SINEs (Shankar et al., 2004; Tomilin, 2008), substantiating an earlier report (Norris et al., 1995). Notably, the primate-specific Alus – divided into the old AluJ, intermediate AluS, and young AluY subfamilies – present consensus transcription factor binding sites distributed in an age100  dependent manner (Shankar et al., 2004). Interestingly, among all gene-associated Alus on chromosome 21 and 22, older elements tend to harbour estrogen response elements and AP-1 docking sites, while younger and/or polymorphic Alus are enriched for other features, including retinoic acid response elements. In addition, important roles in mRNA poly-adenylation have also been revealed for Alus and other TEs in a variety of organisms (Chen et al., 2009; Lee et al., 2008). Since Alus number >106 copies in the human genome, are enriched in gene-dense regions, and contain potential pol II transcriptional regulatory motifs, they could be considered the most important transcriptional regulators. For the first time it is shown here that an Alu can function as a direct promoter for a human gene. More commonly, they and other SINEs are incorporated into mRNA UTRs and coding regions as cassette exons (Brosius, 1999; Hasler et al., 2007; Makalowski, 2000; van de Lagemaat et al., 2003), facilitated by the presence of numerous splice donor and acceptor sites in the sense and antisense orientations (Makalowski et al., 1994). Examples of SINE exaptation as promoters, however, are limited and represented by a sense B1 (Lai et al., 2009) and an antisense B2 (Ferrigno et al., 2001) element in mouse. In human, an isoform of the p75TNFR gene initiates transcription from an antisense MIR SINE, with the adjacent AluJo providing an alternative translation start site (Singer et al., 2004). Furthermore, a bioinformatics analysis reports the existence of several unvalidated antisense Alu-associated TSS (van de Lagemaat et al., 2003). Here, broad transcription of NAIP isoforms from exapted antisense AluJb and AluSg elements is demonstrated in a number of tissues, but it is unknown whether these sequences would also be functional in the sense orientation. The Sg and Jb exaptations associated with NAIP transcription belong to older families that exhibit 10% and 15% divergence from their consensus sequences, respectively. Remarkably, NAIPJb-associated transcripts are more highly expressed than fulllength isoforms in many tissues, but NAIPSg levels are at the limit of detection. We further demonstrate that the Alu-initiated NAIP transcripts extend to the 3’ terminus, and that the 101  associated ORF, harbouring only NBD and LRRs, is translated in a variety of cell lines and primary human tissues. Our findings also indicate that the other predicted novel NAIP proteins may be expressed, in addition to the BIR-less isoform directly assessed here. It is notable that the tissue blot we screened derives from one adult individual, with the exception of testis, indicated by the manufacturer as an accidental fatality. An earlier analysis of pooled primary human tissue samples using a different antibody, also revealed similar NAIP protein isoforms that were speculated to arise by alternative splicing (Maier et al., 2007). Nonetheless, the data presented here substantiate transcriptome analyses that reveal alternative promoters usage as an important source of alternative mRNAs and proteins (Carninci et al., 2006; Wang et al., 2008). The NAIP gene first rose to prominence when it was cloned as a putative disease allele for the neurodegenerative disorder, Spinal Muscular Atrophy (SMA) (Roy et al., 1995), but is now understood to influence SMA severity, which is induced by the adjacent SMN gene (Lefebvre et al., 1995). Its identification did seed discovery of the Inhibitor of Apoptosis Protein (IAP) family in animals (Liston et al., 1996). The IAPs sequester activated caspases, the agents of cell death, via their signature N-terminal BIR domains (Liston et al., 2003). Interest in NAIP was renewed through the discovery that polymorphism of the murine Naip5 (Birc1e) copy solely determines permissiveness of Legionella pneumophila replication in host macrophages (Diez et al., 2003). Human Legionella infections result in Legionnaire’s disease, a severe type of pneumonia (McDade et al., 1977). It was recently shown that human NAIP also blocks L. pneumophila replication in cell lines and primary cells, suggesting a common function (Vinzing et al., 2008). NAIP-dependent sensing of cytosolic microbial patterns is LRR-dependent, and is currently known to respond to Legionella and Salmonella typhimurium flagellin (Ren et al., 2006). These and other findings point to an important role in the innate immune response, and justify the inclusion of NAIP in the NLR superfamily (Harton et al., 2002). Invariably, the NLRs  102  possess a central NBD and C-terminal LRRs; collectively they survey the cytosol for pathogen associated molecular patterns and elicit the appropriate response (Fritz et al., 2006). While the potential functions of the novel NAIP protein isoforms are unknown, there are several possibilities. Firstly, NAIP proteins are known to homo-oligomerize via their NBD (Davoodi et al., 2004); therefore, expression of BIR-truncated isoforms and their subsequent interaction with NAIPfull, could be a mechanism whereby its anti-apoptotic properties are effectively dispersed among a greater number of cytosolic molecules. Alternatively, these could function as dominant negative proteins that serve to regulate the amount of anti-apoptotic NAIP molecules active in a given cell. Finally, expression of NAIP protein isoforms could represent a new example of innovation within the innate immune system, whereby hetero-oligomerization of NLRs creates diversity among these cytosolic sensors, analogous to the Natural Killer inhibitory cell receptor repertoire (Raulet et al., 2001). Indeed, NBD-mediated heterotypic interactions of some NLRs, including NAIP, have been demonstrated (Damiano et al., 2004). Moreover, NAIP was also shown to co-precipitate with its closest homologue, ICE protease activating factor (Ipaf) (Zamboni et al., 2006). Together these proteins activate Interleukin converting enzyme (ICE or caspase 1), and initiate caspase 1-dependent cell death in response to cytosolic flagellin (Molofsky et al., 2006; Ren et al., 2006; Zamboni et al., 2006). Although caspase 1 is required to cleave the inflammatory cytokines proIL-1# and proIL-18 into their active forms, their involvement in this process remains unresolved. Interestingly, and perhaps not coincidentally, the cellular processes affected by IL 1" – proliferation, differentiation, and apoptosis – are the same as those influenced by AP-1 transcriptional regulation (Shaulian and Karin, 2002). Genes involved in immunity tend to permit regulatory variation (van de Lagemaat et al., 2003), as do multicopy genes (Makalowski, 2000). While it is known that alternative 5’/3’ ends create genetic variation that leads to proteome evolution (Carninci et al., 2006; Johnson et al., 2003; Wang et al., 2008), the effect of Alu elements is under appreciated. Here we show that 103  transcription from Alus generates a novel NAIP ORF that is subsequently translated, clearly indicating the effect they have on not only gene regulation, and perhaps establishment of transcriptional networks (Feschotte, 2008; Shankar et al., 2004), but also proteome evolution.  104  3.5 References Arcot, S.S., Wang, Z., Weber, J.L., Deininger, P.L., and Batzer, M.A. (1995). Alu repeats: a source for the genesis of primate microsatellites. Genomics 29, 136-144. Brosius, J. (1999). RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238, 115-134. Butler, J.E., and Kadonaga, J.T. (2002). The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev 16, 2583-2592. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. (2006). Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38, 626-635. Chen, C., Ara, T., and Gautheret, D. (2009). Using Alu elements as polyadenylation sites: A case of retroposon exaptation. Mol Biol Evol 26, 327-334. Chen, Q., Baird, S.D., Mahadevan, M., Besner-Johnston, A., Farahani, R., Xuan, J., Kang, X., Lefebvre, C., Ikeda, J.E., Korneluk, R.G., et al. (1998). Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121-127. Damiano, J.S., Oliveira, V., Welsh, K., and Reed, J.C. (2004). Heterotypic interactions among NACHT domains: implications for regulation of innate immune responses. Biochem J 381, 213219. Davoodi, J., Lin, L., Kelly, J., Liston, P., and MacKenzie, A.E. (2004). Neuronal apoptosisinhibitory protein does not interact with Smac and requires ATP to bind caspase-9. J Biol Chem 279, 40622-40628. Diez, E., Lee, S.H., Gauthier, S., Yaraghi, Z., Tremblay, M., Vidal, S., and Gros, P. (2003). Birc1e is the gene within the Lgn1 locus associated with resistance to Legionella pneumophila. Nat Genet 33, 55-60. Doolittle, W.F., and Sapienza, C. (1980). Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601-603. Economou, E.P., Bergen, A.W., Warren, A.C., and Antonarakis, S.E. (1990). The polydeoxyadenylate tract of Alu repetitive elements is polymorphic in the human genome. Proc Natl Acad Sci U S A 87, 2951-2954. Ferrigno, O., Virolle, T., Djabari, Z., Ortonne, J.P., White, R.J., and Aberdam, D. (2001). Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat Genet 28, 77-81. Feschotte, C. (2008). Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9, 397-405. Fritz, J.H., Ferrero, R.L., Philpott, D.J., and Girardin, S.E. (2006). Nod-like proteins in immunity, inflammation and disease. Nat Immunol 7, 1250-1257. 105  Harton, J.A., Linhoff, M.W., Zhang, J., and Ting, J.P. (2002). Cutting edge: CATERPILLER: a large family of mammalian genes containing CARD, pyrin, nucleotide-binding, and leucine-rich repeat domains. J Immunol 169, 4088-4093. Hasler, J., Samuelsson, T., and Strub, K. (2007). Useful 'junk': Alu RNAs in the human transcriptome. Cell Mol Life Sci 64, 1793-1800. Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., and Shoemaker, D.D. (2003). Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302, 2141-2144. Jordan, I.K., Rogozin, I.B., Glazko, G.V., and Koonin, E.V. (2003). Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 19, 68-72. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12, 996-1006. Kim, D.D., Kim, T.T., Walsh, T., Kobayashi, Y., Matise, T.C., Buyske, S., and Gabriel, A. (2004). Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14, 1719-1725. Kozak, M. (1987). An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res 15, 8125-8148. Lai, C.B., Zhang, Y., Rogers, S.L., and Mager, D.L. (2009). Creation of the two isoforms of rodent NKG2D was driven by a B1 retrotransposon insertion. Nucleic Acids Res 37, 3032-3043. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. Lee, J.Y., Ji, Z., and Tian, B. (2008). Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3'-end of genes. Nucleic Acids Res 36, 5581-5590. Lefebvre, S., Burglen, L., Reboullet, S., Clermont, O., Burlet, P., Viollet, L., Benichou, B., Cruaud, C., Millasseau, P., Zeviani, M., et al. (1995). Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155-165. Lev-Maor, G., Ram, O., Kim, E., Sela, N., Goren, A., Levanon, E.Y., and Ast, G. (2008). Intronic Alus influence alternative splicing. PLoS Genet 4, e1000204. Liston, P., Fong, W.G., and Korneluk, R.G. (2003). The inhibitors of apoptosis: there is more to life than Bcl2. Oncogene 22, 8568-8580. Liston, P., Roy, N., Tamai, K., Lefebvre, C., Baird, S., Cherton-Horvat, G., Farahani, R., McLean, M., Ikeda, J.E., MacKenzie, A., et al. (1996). Suppression of apoptosis in mammalian cells by NAIP and a related family of IAP genes. Nature 379, 349-353.  106  Lowe, C.B., Bejerano, G., and Haussler, D. (2007). Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc Natl Acad Sci U S A 104, 8005-8010. Maier, J.K., Balabanian, S., Coffill, C.R., Stewart, A., Pelletier, L., Franks, D.J., Gendron, N.H., and MacKenzie, A.E. (2007). Distribution of neuronal apoptosis inhibitory protein in human tissues. J Histochem Cytochem 55, 911-923. Makalowski, W. (2000). Genomic scrap yard: how genomes utilize all that junk. Gene 259, 6167. Makalowski, W., Mitchell, G.A., and Labuda, D. (1994). Alu sequences in the coding regions of mRNA: a source of protein variability. Trends Genet 10, 188-193. McClintock, B. (1953). Induction of Instability at Selected Loci in Maize. Genetics 38, 579-599. McDade, J.E., Shepard, C.C., Fraser, D.W., Tsai, T.R., Redus, M.A., and Dowdle, W.R. (1977). Legionnaires' disease: isolation of a bacterium and demonstration of its role in other respiratory disease. N Engl J Med 297, 1197-1203. Medstrand, P., van de Lagemaat, L.N., and Mager, D.L. (2002). Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12, 14831495. Molofsky, A.B., Byrne, B.G., Whitfield, N.N., Madigan, C.A., Fuse, E.T., Tateda, K., and Swanson, M.S. (2006). Cytosolic recognition of flagellin by mouse macrophages restricts Legionella pneumophila infection. J Exp Med 203, 1093-1104. Nakagawa, S., Niimura, Y., Gojobori, T., Tanaka, H., and Miura, K. (2008). Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res 36, 861-871. Nigumann, P., Redik, K., Matlik, K., and Speek, M. (2002). Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics 79, 628-634. Norris, J., Fan, D., Aleman, C., Marks, J.R., Futreal, P.A., Wiseman, R.W., Iglehart, J.D., Deininger, P.L., and McDonnell, D.P. (1995). Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem 270, 22777-22782. Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D., et al. (2006). In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499-502. Raulet, D.H., Vance, R.E., and McMahon, C.W. (2001). Regulation of the natural killer cell receptor repertoire. Annu Rev Immunol 19, 291-330. Ren, T., Zamboni, D.S., Roy, C.R., Dietrich, W.F., and Vance, R.E. (2006). Flagellin-deficient Legionella mutants evade caspase-1- and Naip5-mediated macrophage immunity. PLoS Pathog 2, e18. 107  Romanish, M.T., Lock, W.M., van de Lagemaat, L.N., Dunn, C.A., and Mager, D.L. (2007). Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet 3, e10. Roy, N., Mahadevan, M.S., McLean, M., Shutler, G., Yaraghi, Z., Farahani, R., Baird, S., Besner-Johnston, A., Lefebvre, C., Kang, X., et al. (1995). The gene for neuronal apoptosis inhibitory protein is partially deleted in individuals with spinal muscular atrophy. Cell 80, 167178. Schmutz, J., Martin, J., Terry, A., Couronne, O., Grimwood, J., Lowry, S., Gordon, L.A., Scott, D., Xie, G., Huang, W., et al. (2004). The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268-274. Shankar, R., Grover, D., Brahmachari, S.K., and Mukerji, M. (2004). Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol 4, 37. Shaulian, E., and Karin, M. (2002). AP-1 as a regulator of cell life and death. Nat Cell Biol 4, E131-136. Singer, S.S., Mannel, D.N., Hehlgans, T., Brosius, J., and Schmitz, J. (2004). From "junk" to gene: curriculum vitae of a primate receptor isoform gene. J Mol Biol 341, 883-886. Sorek, R., Ast, G., and Graur, D. (2002). Alu-containing exons are alternatively spliced. Genome Res 12, 1060-1067. Tomilin, N.V. (2008). Regulation of mammalian gene expression by retroelements and noncoding tandem repeats. Bioessays 30, 338-348. Tran, V.K., Sasongko, T.H., Hong, D.D., Hoan, N.T., Dung, V.C., Lee, M.J., Gunadi, Takeshima, Y., Matsuo, M., and Nishio, H. (2008). SMN2 and NAIP gene dosages in Vietnamese patients with spinal muscular atrophy. Pediatr Int 50, 346-351. van de Lagemaat, L.N., Landry, J.R., Mager, D.L., and Medstrand, P. (2003). Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet 19, 530-536. Vinzing, M., Eitel, J., Lippmann, J., Hocke, A.C., Zahlten, J., Slevogt, H., N'Guessan P, D., Gunther, S., Schmeck, B., Hippenstiel, S., et al. (2008). NAIP and Ipaf control Legionella pneumophila replication in human cells. J Immunol 180, 6808-6815. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476. Xu, M., Okada, T., Sakai, H., Miyamoto, N., Yanagisawa, Y., MacKenzie, A.E., Hadano, S., and Ikeda, J.E. (2002). Functional human NAIP promoter transcription regulatory elements for the NAIP and PsiNAIP genes. Biochim Biophys Acta 1574, 35-50. Zamboni, D.S., Kobayashi, K.S., Kohlsdorf, T., Ogura, Y., Long, E.M., Vance, R.E., Kuida, K., Mariathasan, S., Dixit, V.M., Flavell, R.A., et al. (2006). The Birc1e cytosolic pattern108  recognition receptor contributes to the detection and control of Legionella pneumophila infection. Nat Immunol 7, 318-325.  109  CHAPTER 4 SIGNIFICANCE AND OUTSTANDING ISSUES  110  4.1 Role of TEs in NAIP gene evolution This thesis illustrates a fascinating example of regulatory evolution by a single gene. In Chapter 2 of this work, I showed that the human and mouse NAIP genes have improbably converged on the same principle of domesticating LTR promoters. In fact, multiple events have occurred in each lineage. This remarkable observation is the first known example where orthologous genes have turned to TEs to serve a regulatory role. Furthermore, usage of an LTR as the principal gene promoter, as with the mouse Naip copies, also has not previously been reported. Collectively my findings indicate that rapidly evolving/expanding genes are more prone to experimentation with regulatory innovations, and this view is further substantiated by the findings presented in Chapter 3. Indeed, NAIP exhibits copy number variation in both human and mouse (Chen et al., 1998; Growney and Dietrich, 2000; Schmutz et al., 2004), and I show that at least three of the copies in the reference human genome are transcribed, as are all B6 copies. Due to the deletion of their promoters and the 5’ exons, two of the NAIP paralogues were forced to ‘recruit’ new transcriptional regulatory regions for continued expression. Among various novel promoters identified, two stand out since they provide the first experimentally verified examples of Alu SINEs operating as gene promoters. The combined activity of all NAIP promoters results in the expression of an array of protein isoforms, also a previously unknown fact. Taken together, my findings indicate that TEs have served an important role in facilitating NAIP evolution, and this concept is discussed below.  4.2 Unequal exchange and the emergence of new genes It is understood to be more simple for evolution to operate with pre-assembled or existing functional genetic material than it is to create such sequence de novo (Ohno, 1970, 1999). The principle mechanism by which new ‘evolvable’ sequence may arise is through the duplication of genes or larger chromosomal segments, whereby one of the paralogues becomes hypermutable 111  due to the release of selective constraint. It is no surprise that TEs are an important factor in mediating duplication (Bailey et al., 2003; Kim et al., 2008), due to their high copy number and more or less random distribution. The human and mouse NAIP genes are experiencing ongoing copy number changes. As previously indicated, Naip copy number is dependent upon the context of the mouse strain (Growney and Dietrich, 2000), while the human genes are also present in variable numbers in different individuals (Chen et al., 1998; Schmutz et al., 2004; Tran et al., 2008). Interestingly, in human all of the genes encoded in the SMA region (NAIP, SMN, GTF2H2, SERF1, GUSBP1) have been subjected to copy number expansion, but in mouse only the Naip genes have amplified. Another interesting feature of the organization of these syntenic regions is that while NAIP is encoded on the minus strand in both human and mouse, the genes that flank these unstable regions are encoded at opposite ends, indicating at least two ancient inversion events. Regardless, all coding-competent copies of NAIP are encoded on the minus strand, supporting a second primate-specific inversion internal to the first. On the one hand, all human SDs harbouring a NAIP copy possess TE sequence at their flanks, most prevalently Alus (personal observation). On the other hand, the precise flanks of each duplicated mouse Naip copy are not clear. In fact, as shown in Chapter 2, very few of the abundant TEs present nearby or within the Naip genes are common (Romanish et al., 2007); however, some ancient TEs are shared by paralogues and can be used to infer their phylogeny, as can their sequence identity. Nonetheless, it is difficult to reconstruct the exact mechanism that led to the initial twinning of the rodent Naip genes. It is notable that several full-length LINEs ranging from <1-15% divergence have integrated within the introns of rodent Naip copies, or the intergenic space occupied by a mNaip pseudogene. Moreover, it was recently reported that LINEs and pseudogenes significantly associate with copy number variation (Kim et al., 2008), although their role in mediating ongoing mNaip expansion must be studied further. Such an 112  analysis can be aided by the sequencing of other inbred mouse lines or rodents, particularly the last common ancestor to mouse and rat, which also encoded two Naip copies (Shin et al., 2003).  4.3 Genomic recycling and the assembly of new genes In addition to their potential evolution by segmental duplication, genes can also be duplicated by retroposition and 5’ and 3’ transduction (Marques et al., 2005; Moran et al., 1999; Vinckenbosch et al., 2006; Xing et al., 2006). The scattering of genetic material by these mechanisms is not without its limitations. While retroposition of a processed or unprocessed transcript results in the duplication of an entire gene, it will lack the 5’ and 3’ regulatory regions from which the gene was expressed. Based on the data presented in Chapters 2 and 3 for the human NAIP gene, as well as some unpublished observations for the mNaip2 copy, this may not be an impediment since transcription start sites are also observed within exons. However, the generality of intraexonic transcription for mammalian genes has not been investigated. Indeed, a previous study has identified that a sizable fraction of human retroposed pseudogenes show evidence of being transcribed, particularly in testis, by regulatory elements at the site of integration (Vinckenbosch et al., 2006). Rarely is an entire gene duplicated by L1- or SVA-mediated transduction. In fact, the human AMAC genes are the only known example to have been amplified by this mechanism (Xing et al., 2006). More commonly, only specific exons or regulatory regions are transduced (Han and Boeke, 2005); therefore, they must fortuitously re-integrate within or nearby existing genes in order to increase their chance of being exapted. Interestingly, NAIP is a composite gene, with its 5’ BIR domains being possibly viral in origin (Roy et al., 1995), or at the very least as ancient as single celled eukaryotes (LaCasse et al., 1998). By contrast, the 3’ NBD and LRR domains likely have a common origin with the plant resistance, or R, proteins, also used in the defense against pathogens (Martinon et al., 2007). Indeed, as with NAIP, the NLR signature 113  domains can be detected in the sequenced genomes of all eutherian mammals, but not in platypus or oppossum. Interestingly, ~60% amino acid identity across the central ~500 residues of NLRC4 (or IPAF), the closest homologue to NAIP (Tschopp et al., 2003), aligns with a hypothetical protein in frog and lizard. Therefore, the NBD signature domains may be comparatively as ancient as the BIR domains. It would be interesting to screen the genomes of reptile ancestors to pinpoint the emergence of NAIP and its paired BIR and NBD/LRR domains. It is conceivable that a process like LINE-mediated transduction juxtaposed these domain pairs, as with the TRIM5:cyclophin fusion that occurred in owl monkey (Sayah et al., 2004). Alternatively, their union could have been mediated by recombination-based mechanisms.  4.4 One genome’s trash is another’s treasure Two processes by which genes can become duplicated, thereby enabling evolutionary experimentation, were outlined above. Furthermore, I have suggested that TEs likely serve an important role in mediating genomic expansion and are a readily available source of genetic novelty to duplicated genes. However, the domestication of transposed elements as regulatory elements or exons can also operate on single copy genes. By operating on either single copy or duplicated genes, TEs afford the opportunity to tinker with new expression domains or protein isoforms, whilst maintaining the native form of the gene (Lin et al., 2008; Sorek et al., 2002). Indeed, TEs are a readily available source of pre-assembled regulatory modules (Brosius, 1999; Feschotte, 2008; Medstrand et al., 2005; Sorek et al., 2002). In general, individual TEs are randomly distributed throughout the genome; however, when considering the distribution of entire families with respect to gene proximity, clear proclivity for gene-rich regions is only observed with SINEs (Lander et al., 2001; Medstrand et al., 2002; Waterston et al., 2002). Therefore, SINEs may possess the greatest potential for affecting genes; accordingly ~75% of human genes have at least one Alu integration (Grover et 114  al., 2004; Han et al., 2004). Nonetheless, there exist >106 LINE and LTR fragments but only ~25,000 genes in the human genome (Lander et al., 2001), indicating favourable odds of at least some representation within or nearby genes. In fact, ~75% of all human genes also have an L1 integration within their non-coding regions (Grover et al., 2004; Han et al., 2004). Indeed, examples of TEs from all categories that contribute to host genome, transcriptome, and/or proteome diversification are prevalent (Brosius, 1999; Feschotte, 2008; Gotea and Makalowski, 2006; Nekrutenko and Li, 2001). Here I have detailed a particularly extraordinary example of TE promoter domestication by the human and mouse NAIP orthologues, and am unaware of any other similar example. Although, it remains a possibility that NAIPfull is also transcribed from the various internal promoters (this can be addressed through discrimination of a particular nucleotide polymorphism in exon 10), it is likely that its ongoing expansion facilitated the domestication of new promoters. Consistent with these findings is the estimation that ~200 examples of LTR promoter domestication are expected within each of the human and mouse genomes (Romanish et al., 2007). However, for orthologues to domesticate unrelated LTRs as promoters, the expectation drops to a single case. The fact that both human and mouse have adopted multiple LTRs within their own lineages, and in combination with the additional data of Alu-mediated human NAIP transcription, suggests that there exists something particularly unique about this gene. Perhaps because it is experiencing rapid evolution (Cao et al., 2008) and is part of two large gene families is why it experiments with TE promoters, as previously suggested in a global bioinformatics analysis (van de Lagemaat et al., 2003). Whether this is a general mechanism employed by any rapidly evolving gene warrants further study; orthologous genes that are experiencing copy number variation, as is NAIP, would seem like a logical starting point for such an analysis. Therefore, a detailed analysis of the transcription start sites of genes in such regions, using either bioinformatics or experimental methods or both, can address this issue. 115  4.5 Two outstanding issues In the following section I will introduce two major questions that arise from this thesis work. Firstly, I will discuss the mode of transcriptional regulation by NAIP orthologues in mammals other than primates or rodents. Secondly, I speculate as to whether or not TE-promoter domestication is a general mechanism utilized by recently amplified genes, as is the case with NAIP.  4.5.1 Transcription of NAIP orthologues An unresolved issue involves the transcriptional regulation of NAIP orthologues in other mammalian species. Indeed all sequenced primate genomes encode a single NAIPfull copy, as well as the TEs that were shown to be promoters in human. While transcriptional initiation of NAIP has not been studied in these species, it would be interesting to learn if the described regulatory innovations are human-specific. A NAIP orthologue is present in the marmoset (New World Monkey) genome, but the downstream-most MER21C and ERV-P LTRs are deleted. This suggests that testis-specific augmentation of NAIPfull transcription does not occur in New World Monkeys; however, use of the MER21C LTR and/or upstream sequence (Romanish et al., 2007; Xu et al., 2002) still may serve a function in its transcription. It would also be interesting to determine if more distant ancestors such as tarsiers and lemurs also possess homologous 5’ UTRs. By contrast, the 5’ UTRs belonging to NAIP orthologues in rodents are completely unrelated to those in its sister group, the primates, indicating a rapid evolution of the regulatory regions since their divergence. Whereas mouse and rat utilize ORR1E LTR promoters, it is not known if this is the case in other rodents. It would be interesting to examine a rabbit genome, should it become sequenced, since they form a sister group with rodents. Nonetheless, the sequenced genome of a more recently diverged rodent, the guinea pig (UCSC Genome Browser), 116  reveals two candidate copies: one with 78% similarity across ~4 kb of the coding exons and a potential 5’ deleted form that aligns with ~83% similarity across the 3’ coding exons. It is likely that the 5’ and 3’ UTRs are not detectable due to the increased rate of substitution in rodents (Gibbs et al., 2004; Waterston et al., 2002) and because of the low sensitivity of the BLAT tool (UCSC Genome Browser). I also report that the upstream region of the paralogous Naip copy in Guinea pig is repeat poor, although this is likely a reflection of the incomplete repeat library or increased rate of substitution in rodents (Waterston et al., 2002). Lastly, since a NAIP gene exists within primates and rodents, this indicates that their last common ancestor also had ‘fused’ BIR and NLR domains. Therefore, when in evolution did these domains become juxtaposed is an interesting question. Indeed orthologues of NAIP can be detected in cows and horses, which indicates that the last common ancestor of the primate/rodent clade and its sister group containing cows, horses, dogs, and cats had also already undergone this fusion. However, only exons 2 and 10 can be detected in the cat genome, although their juxtaposition and presence of the other NAIP exons is unclear, likely due to the incomplete assembly of this genome. It is expected, however, that the NAIP fusion event occurred well before the radiation of mammals, despite no detectable homology in the platypus or opossum genomes, since ~60% amino acid similarity of the BIR- and NBD-encoding exons (3 - 9) are found in the Xenopus tropicalis (frog) and Anolis carolinensis (lizard) genomes. Finally, a poor understanding of NAIP CNV exists for non-human and non-rodent mammals, but this may be more a reflection of the resources utilized in assembling these genomes.  4.5.2 Experimentation with TEs by amplified genes A second interesting question that arises from my work is if ‘recently’ expanded gene families or genes that are undergoing rapid evolution are more likely to domesticate TEs. As indicated in Chapter 1, TE domestication could either occur as regulatory innovations or protein 117  alterations. While several examples of TE incorporation into the coding sequence of genes have been reported (Gotea and Makalowski, 2006; Singer et al., 2004; Tang et al., 2000), an in depth analysis has not addressed whether these cases are biased to duplicated genes. One interesting example in cows reveals that a paralogue of the Bcnt gene has exonized a LINE fragment, which results in altered protein conformation (Iwashita et al., 2003). With the availability of numerous sequenced genomes, and the likely release of additional others, a plethora of data already exists that could be screened for instances of TEdonated promoters and TSSs. Some potential sources of data that can be utilized for this sort of analysis include: syntenic regions that are undergoing copy number expansion (as observed with human and mouse NAIP); lineage-specific gene families; or regions of segmental duplication. Gene expression databases such as expressed sequence tag (EST), paired-end ditag (PET), and cap analysis of gene expression (CAGE), can provide information about TE-domestication events in the above-mentioned regions. However, as shown with NAIP in human and mouse, many examples of unannotated TE-derived TSSs exist, and are most easily discovered by an unbiased approach such as 5’ RACE. Another limitation of searching for TE-derived TSSs in other mammals is the incomplete nature of repeat libraries in these species.  4.6 Broad implications of these findings In the penultimate section of this thesis I will discuss two major implications of these findings on future research. Firstly, I will revisit earlier hypotheses that implicate TEs in the establishment of transcriptional networks. Secondly, I will bring attention to the potential role of TEs in facilitating proteome diversification.  118  4.6.1 TEs and transcriptional networks The role of TEs in establishing transcriptional networks, through the exaptation of their embedded pol II regulatory signals, was first suggested 40 years ago (Britten and Davidson, 1969). However, confirming these hypotheses has proved to be more difficult. Relatively recent findings that demonstrate the use of TE-derived promoters or transcription factor binding sites by single genes have certainly renewed efforts toward this goal (Bourque et al., 2008; Wang et al., 2007; Zemojtel et al., 2009). The combination of genome sequence availability coupled with the emergence of new experimental techniques, has finally permitted the investigation of such questions. Bioinformatics analyses have substantiated, on a genomic scale, early experiments that indicated Alus present hormone response elements that are functional in the transcriptional regulation of particular genes (Tomilin, 2008). Additionally, the same bioinformatics studies also identified consensus binding sites for other transcription factors such as AP-1, p53, and AML within Alus (Polak and Domany, 2006; Shankar et al., 2004; Zemojtel et al., 2009). Other reports indicate that at least some L1s carry RUNX-3, SRY, and YY1 binding sites within their 5’ UTRs (Athanikar et al., 2004; Tchenio et al., 2000; Yang et al., 2003). Computer-based predictions, while useful for provoking hypotheses, do not verify which potential sites are actually used in vivo. However, this has been enabled by the development of chromatin immunoprecipitation (ChIP) and related techniques. Recent experiments unequivocally prove that TEs harbour a plethora of functional in vivo binding sites for such TFs as p53, ER, Oct4/Sox2, and CTCF (Bourque et al., 2008; Wang et al., 2007; Zemojtel et al., 2009). As the ChIP technique becomes more advanced and the cost of sequencing decreases, the amount of data that will be generated is tantalizing. The investigation of regulatory networks can be studied in a variety of contexts, such as in cancer and development. In fact, a notable report indicates that most of the transcripts in oocytes and early embryos are LTR retrotransposons, and 119  that a significant fraction of these form chimeras with host genes (Peaston et al., 2004). This result indicates that an oocyte-specific transcriptome in mice favours the use of LTR promoters, but an investigation into the culprit TFs has not been completed.  4.6.2 Role of TEs in facilitating proteome diversification Some potential roles of the NAIP protein isoforms were discussed in Chapter 3.4. Since the observed changes to NAIPfull involved the N-terminal BIR domains, the relevance of these isoforms in innate immunity is particularly intriguing. Indeed NAIP is a rapidly evolving gene, as indicated by CNV, TE-mediated regulatory innovations, and its more diverged BIRs relative to other IAPs (Cao et al., 2008; Romanish et al., 2007; Schmutz et al., 2004). The juxtaposition of domains that define two different protein families is further evidence of this fact. It is well established that rapidly evolving genes, such as those involved in immunity or the response to external stimuli are more likely to expand in copy number (Redon et al., 2006) or to experiment with TE promoters (van de Lagemaat et al., 2003). In fact, expansion of immunity-associated genes appears to be a hallmark of this class (Martinon et al., 2007; Raulet et al., 2001), presumably in an effort to broaden the range of targets to which they can respond. Therefore, investigating the usage of TEs in facilitating diversification of the proteome, particularly for rapidly evolving genes, is a worthwhile pursuit. Several bioinformatics studies have sought to address this issue (Gotea and Makalowski, 2006; Nekrutenko and Li, 2001), but only scored instances where the native ORF has increased in size due to TE exonization. However, it is established that TEs can also serve to disrupt an ORF in other ways, such as by introducing a premature termination codon or poly-adenylation signal. As shown here, the careful scrutiny of a particular locus is an ideal way to identify TE-induced novel protein isoforms. Perhaps, if all TEdirected transcript, and ORF, variants are considered an abundance of new examples will emerge. Realistically, it is a challenge to study the array of alternative splicing and TSS 120  associated with every human gene, and then to subsequently determine their effect on the protein. However, a good starting point may be such thoroughly investigated regions as the 1% of the human genome represented by the encyclopedia of DNA elements (ENCODE) project. Expansion of gene paralogues is succeeded by attempts to stay relevant, and over time drift will result in alterations to coding regions, as with the mouse Naip copies (Endrizzi et al., 2000). In fact, it is known that only one B6 copy is responsible for L. pneumophila clearance (Diez et al., 2003). But for recent expansions, where the protein products are identical in all respects, the initial strategy may involve experimentation with regulatory changes, as with human NAIP. Indeed, highly conserved genes do not exhibit inflated Ka/Ks ratios, and ‘evolve’ using a comparatively slow molecular clock. On the other hand, NAIP is rapidly evolving and provides an interesting example with which to gain an understanding of how the early, and arguably most rapid, stages of evolution progress.  4.7 Concluding remarks In light of accumulating genome and transcriptome sequencing studies, mammalian genes must be considered in the context of their surroundings (Carninci et al., 2006; Wang et al., 2008). Remarkably, only 1.5% of the human (and mouse) genomes comprise exonic sequence, in contrast to the ~25% occupied by introns (Lander et al., 2001). Transposed elements, in contrast, ‘fill out’ a significant majority of mammalian genomes; approximately half of the human and mouse genomes are TE-derived (Lander et al., 2001). The paradigm that genomic TEs are junk DNA, since most are immobile (Doolittle and Sapienza, 1980; Orgel and Crick, 1980), has gradually shifted to an acceptance of their role in facilitating gene and genome evolution (Faulkner et al., 2009; Feschotte, 2008; Kazazian, 2004; Medstrand et al., 2005). The purpose of this work was to investigate the role of TEs in facilitating the evolution of the rapidly evolving NAIP genes. These results and observations were discussed in the context of 121  mechanisms that are known to catalyze the evolution of new genes, such as segmental duplication, exon shuffling, and acquisition of regulatory innovations. In the example of NAIP evolution, it is clear that all three of these processes have been in operation; raising the question of the generality of these mechanisms in the emergence/diversification of genes.  122  4.8 References Athanikar, J.N., Badge, R.M., and Moran, J.V. (2004). A YY1-binding site is required for accurate human LINE-1 transcription initiation. Nucleic Acids Res 32, 3846-3855. Bailey, J.A., Liu, G., and Eichler, E.E. (2003). An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73, 823-834. Bourque, G., Leong, B., Vega, V.B., Chen, X., Lee, Y.L., Srinivasan, K.G., Chew, J.L., Ruan, Y., Wei, C.L., Ng, H.H., et al. (2008). Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 18, 1752-1762. Britten, R.J., and Davidson, E.H. (1969). Gene regulation for higher cells: a theory. Science 165, 349-357. Brosius, J. (1999). RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238, 115-134. Cao, L., Wang, Z., Yang, X., Xie, L., and Yu, L. (2008). The evolution of BIR domain and its containing proteins. FEBS Lett 582, 3817-3822. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. (2006). Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38, 626-635. Chen, Q., Baird, S.D., Mahadevan, M., Besner-Johnston, A., Farahani, R., Xuan, J., Kang, X., Lefebvre, C., Ikeda, J.E., Korneluk, R.G., et al. (1998). Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121-127. Diez, E., Lee, S.H., Gauthier, S., Yaraghi, Z., Tremblay, M., Vidal, S., and Gros, P. (2003). Birc1e is the gene within the Lgn1 locus associated with resistance to Legionella pneumophila. Nat Genet 33, 55-60. Doolittle, W.F., and Sapienza, C. (1980). Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601-603. Endrizzi, M.G., Hadinoto, V., Growney, J.D., Miller, W., and Dietrich, W.F. (2000). Genomic sequence analysis of the mouse Naip gene array. Genome Res 10, 1095-1102. Faulkner, G.J., Kimura, Y., Daub, C.O., Wani, S., Plessy, C., Irvine, K.M., Schroder, K., Cloonan, N., Steptoe, A.L., Lassmann, T., et al. (2009). The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. Feschotte, C. (2008). Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9, 397-405. Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Muzny, D.M., Sodergren, E.J., Scherer, S., Scott, G., Steffen, D., Worley, K.C., Burch, P.E., et al. (2004). Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493-521.  123  Gotea, V., and Makalowski, W. (2006). Do transposable elements really contribute to proteomes? Trends Genet 22, 260-267. Grover, D., Mukerji, M., Bhatnagar, P., Kannan, K., and Brahmachari, S.K. (2004). Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics 20, 813-817. Growney, J.D., and Dietrich, W.F. (2000). High-resolution genetic and physical map of the Lgn1 interval in C57BL/6J implicates Naip2 or Naip5 in Legionella pneumophila pathogenesis. Genome Res 10, 1158-1171. Han, J.S., and Boeke, J.D. (2005). LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression? Bioessays 27, 775-784. Han, J.S., Szak, S.T., and Boeke, J.D. (2004). Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429, 268-274. Iwashita, S., Osada, N., Itoh, T., Sezaki, M., Oshima, K., Hashimoto, E., Kitagawa-Arita, Y., Takahashi, I., Masui, T., Hashimoto, K., et al. (2003). A transposable element-mediated gene divergence that directly produces a novel type bovine Bcnt protein including the endonuclease domain of RTE-1. Mol Biol Evol 20, 1556-1563. Kazazian, H.H., Jr. (2004). Mobile elements: drivers of genome evolution. Science 303, 16261632. Kim, P.M., Lam, H.Y., Urban, A.E., Korbel, J.O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., and Gerstein, M.B. (2008). Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res 18, 1865-1874. LaCasse, E.C., Baird, S., Korneluk, R.G., and MacKenzie, A.E. (1998). The inhibitors of apoptosis (IAPs) and their emerging role in cancer. Oncogene 17, 3247-3259. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. Lin, L., Shen, S., Tye, A., Cai, J.J., Jiang, P., Davidson, B.L., and Xing, Y. (2008). Diverse splicing patterns of exonized Alu elements in human tissues. PLoS Genet 4, e1000225. Marques, A.C., Dupanloup, I., Vinckenbosch, N., Reymond, A., and Kaessmann, H. (2005). Emergence of young human genes after a burst of retroposition in primates. PLoS Biol 3, e357. Martinon, F., Gaide, O., Petrilli, V., Mayor, A., and Tschopp, J. (2007). NALP inflammasomes: a central role in innate immunity. Semin Immunopathol 29, 213-229. Medstrand, P., van de Lagemaat, L.N., Dunn, C.A., Landry, J.R., Svenback, D., and Mager, D.L. (2005). Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res 110, 342-352.  124  Medstrand, P., van de Lagemaat, L.N., and Mager, D.L. (2002). Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12, 14831495. Moran, J.V., DeBerardinis, R.J., and Kazazian, H.H., Jr. (1999). Exon shuffling by L1 retrotransposition. Science 283, 1530-1534. Nekrutenko, A., and Li, W.H. (2001). Transposable elements are found in a large number of human protein-coding genes. Trends Genet 17, 619-621. Ohno, S. (1970). Evolution by gene duplication (New York, Springer-Verlag). Ohno, S. (1999). Gene duplication and the uniqueness of vertebrate genomes circa 1970-1999. Semin Cell Dev Biol 10, 517-522. Orgel, L.E., and Crick, F.H. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604-607. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. (2004). Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7, 597-606. Polak, P., and Domany, E. (2006). Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genomics 7, 133. Raulet, D.H., Vance, R.E., and McMahon, C.W. (2001). Regulation of the natural killer cell receptor repertoire. Annu Rev Immunol 19, 291-330. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., et al. (2006). Global variation in copy number in the human genome. Nature 444, 444-454. Romanish, M.T., Lock, W.M., van de Lagemaat, L.N., Dunn, C.A., and Mager, D.L. (2007). Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet 3, e10. Roy, N., Mahadevan, M.S., McLean, M., Shutler, G., Yaraghi, Z., Farahani, R., Baird, S., Besner-Johnston, A., Lefebvre, C., Kang, X., et al. (1995). The gene for neuronal apoptosis inhibitory protein is partially deleted in individuals with spinal muscular atrophy. Cell 80, 167178. Sayah, D.M., Sokolskaja, E., Berthoux, L., and Luban, J. (2004). Cyclophilin A retrotransposition into TRIM5 explains owl monkey resistance to HIV-1. Nature 430, 569-573. Schmutz, J., Martin, J., Terry, A., Couronne, O., Grimwood, J., Lowry, S., Gordon, L.A., Scott, D., Xie, G., Huang, W., et al. (2004). The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268-274. Shankar, R., Grover, D., Brahmachari, S.K., and Mukerji, M. (2004). Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol 4, 37. 125  Shin, S.W., Lee, M.Y., Kwon, G.Y., Park, J.W., Yoo, M., Kim, S.K., Oh, T.H., and Choe, B.K. (2003). Cloning and characterization of rat neuronal apoptosis inhibitory protein cDNA. Neurochem Int 42, 481-491. Singer, S.S., Mannel, D.N., Hehlgans, T., Brosius, J., and Schmitz, J. (2004). From "junk" to gene: curriculum vitae of a primate receptor isoform gene. J Mol Biol 341, 883-886. Sorek, R., Ast, G., and Graur, D. (2002). Alu-containing exons are alternatively spliced. Genome Res 12, 1060-1067. Tang, W., Gunn, T.M., McLaughlin, D.F., Barsh, G.S., Schlossman, S.F., and Duke-Cohan, J.S. (2000). Secreted and membrane attractin result from alternative splicing of the human ATRN gene. Proc Natl Acad Sci U S A 97, 6025-6030. Tchenio, T., Casella, J.F., and Heidmann, T. (2000). Members of the SRY family regulate the human LINE retrotransposons. Nucleic Acids Res 28, 411-415. Tomilin, N.V. (2008). Regulation of mammalian gene expression by retroelements and noncoding tandem repeats. Bioessays 30, 338-348. Tran, V.K., Sasongko, T.H., Hong, D.D., Hoan, N.T., Dung, V.C., Lee, M.J., Gunadi, Takeshima, Y., Matsuo, M., and Nishio, H. (2008). SMN2 and NAIP gene dosages in Vietnamese patients with spinal muscular atrophy. Pediatr Int 50, 346-351. Tschopp, J., Martinon, F., and Burns, K. (2003). NALPs: a novel protein family involved in inflammation. Nat Rev Mol Cell Biol 4, 95-104. van de Lagemaat, L.N., Landry, J.R., Mager, D.L., and Medstrand, P. (2003). Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet 19, 530-536. Vinckenbosch, N., Dupanloup, I., and Kaessmann, H. (2006). Evolutionary fate of retroposed gene copies in the human genome. Proc Natl Acad Sci U S A 103, 3220-3225. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476. Wang, T., Zeng, J., Lowe, C.B., Sellers, R.G., Salama, S.R., Yang, M., Burgess, S.M., Brachmann, R.K., and Haussler, D. (2007). Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci U S A 104, 18613-18618. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562. Xing, J., Wang, H., Belancio, V.P., Cordaux, R., Deininger, P.L., and Batzer, M.A. (2006). Emergence of primate genes by retrotransposon-mediated sequence transduction. Proc Natl Acad Sci U S A 103, 17608-17613. 126  Xu, M., Okada, T., Sakai, H., Miyamoto, N., Yanagisawa, Y., MacKenzie, A.E., Hadano, S., and Ikeda, J.E. (2002). Functional human NAIP promoter transcription regulatory elements for the NAIP and PsiNAIP genes. Biochim Biophys Acta 1574, 35-50. Yang, N., Zhang, L., Zhang, Y., and Kazazian, H.H., Jr. (2003). An important role for RUNX3 in human L1 transcription and retrotransposition. Nucleic Acids Res 31, 4929-4940. Zemojtel, T., Kielbasa, S.M., Arndt, P.F., Chung, H.R., and Vingron, M. (2009). Methylation and deamination of CpGs generate p53-binding sites on a genomic scale. Trends Genet 25, 6366.  127  APPENDIX A CHAPTER 2 SUPPLEMENTARY FIGURES AND TABLES  128  Figure A.1. Analysis of human NAIP 5’ UTR and coding region splice isoforms. Cloned RT-PCR products amplified by primers specific for the two alternative LTR-derived transcripts are shown. Panel A represents RT-PCR products specific for the HERV-P-driven form (form ii, Figure 2.1a). The arrows show locations of primers used for quantitative real-time RT-PCR. Panel B represents products from the MER21Cassociated form (form iii, Figure 2.1a). Recruitment of a heterogeneous ERV (5’-HAL1/LINE:AluJb/SINE-3’) was detected in sequenced clones from these isoforms. We also observed occasional exclusion of the exon from which most 5’ RACE clones were found to initiate (Figure 2.1a, form i). These UTR variants could not be compared to those reported by Xu et al. (Xu et al., 2002) as their sequences are not available. (C) Splice variants identified by RT-PCR using primers specific for coding region exons are shown. Downstream of the first coding exon, 74bp of a 102bp remnant of an antisense MIRm SINE is recruited into the coding region of human NAIP in peripheral blood leukocytes (PBLs). While verified by direct sequencing only in PBLs, we infer transcription of this isoform in all tissues because the same band is seen in all lanes of our expression profiling experiment (Figure 2.4a, top bandpanel O). This isoform does not preserve the established ORF (+292-+4503, relative to U19251), and is predicted to yield a truncated protein encoding only the first and part of the second BIR domain (+292-+888, relative to U19251). However, downstream of the intervening MIRm SINE we report on a predicted ORF (+919-+4578, relative to U19251) initiating at a start codon in-frame with the standard one (+292, U19251) that retains part of the second BIR, entire third BIR followed by the expected NBS and LRR motifs. Another minor isoform splices out the second coding exon, also disrupting the normal ORF, but utilizes an in-frame start codon to yield a novel predicted peptide (+993-+4412, relative to U19251) encoding the third BIR, and NBS and LRR motifs. In all diagrams, black boxes indicate non-repeat-derived exons and coloured boxes are repeat-derived exons with their identities labeled above. ATG denotes the accepted initiation codon for NAIP, and AS=antisense.  129  Figure A.2. Analysis of mNaip 5’ UTR and coding region splice isoforms. (A) Cloned RT-PCR products amplified by primers specific for transcripts initiating within the ORR1E LTR are shown. Size of the ORR1E exon shows some variability among mNaip copies. Only mNaipa/b utilize a second, downstream exon within their 5’ UTRs (labeled 2). mNaipb also demonstrates recruitment of two other novel exons into its 5’UTR, one of which utilizes partial B1F1/SINE sequence. Interestingly, we observe a mNaipe isoform that is not spliced across the length of its 5’ UTR, we are unable to comment whether it yields a functional protein, but might represent a primary transcript not yet processed by splicing machinery. (B) Splice variants for each mNaip copy using primers across coding region exons are shown. Similar to human, we find recruitment of a repetitive exon into the mNaipa coding region, here 129bp of the 5’ segment of a 554bp antisense Lx LINE remnant splices in downstream of the second coding exon. This novel exon introduces an in-frame stop codon and the resulting truncated protein (+113-+904 relative to AF135491) encodes only the first two BIR domains. In addition, a novel ORF (+1023-+4442, relative to AF135491) where the new initiation codon downstream of the intervening Lx LINE is in-frame with the standard one (+113, AF135491) could potentially be translated to encode a protein incorporating the third BIR domain followed by the NBS and LRR. Similarly truncated proteins are expected for the isoforms of mNaipe (AF135492) and f (AF135494) that splice out the second coding exon. The C-terminal truncated peptide (+200-+847 relative to AF135491 and AF135494) terminates within the third coding exon and is predicted to encode the first and part of the second BIR. A start codon in-frame with standard one (+200, AF13549-1/4) within the fifth coding exon yields an ORF (+892-+4311, AF13549-1/4) that encodes the third BIR, followed by the NBS and LRR. In all diagrams, black boxes indicate non-repeat-derived exons and coloured boxes are repeatderived exons with their identities labeled above. ATG denotes the accepted initiation codon for Naip, and AS=antisense.  130  Table A.1. LTR insertions within the analyzed windows for all human and mouse IAP genes. Table A.1a. Number and type of LTR insertions upstream (10 kb upstream + 2.5 kb 5’ gene sequence) of mouse IAP genes. Gene Name Number of LTRs LTR type 7 RMER17A2 Naipa (Birc1a) IAPEY3_LTR MTD (2) MTE2b ORR1E RMER15 12 ORR1E Naipb (Birc1b) MT2A (2) RLTR20A/D ORR1D1 ORR1D2 RLTR15 ORR1C1 RLTR14 IAPEY_LTR RMER21A ORR1B2 4 RMER10B cIAP1 (Birc2) MURVY-LTR (2) MTD 6 ORR1C2 cIAP2 (Birc3) RLTR20A2 RMER10A MTEa RMER17B MTE2a 1 MT2B XIAP (Birc4) 7 RMER4B Survivin (Birc5) MTD RMER15 MTEa RLTR20B2/3 RMER4A/B RMER10B 1 BGLII Bruce (Birc6) 0 ML-IAP (Birc7)  Ts-IAP (Birc8) was omitted from this analysis because no EST or cDNA evidence exists for its transcription, despite presence on chromosome 7.  131  Table A.1b. Number and type of LTR insertions upstream (10 kb upstream + 2.5 kb 5’ gene sequence) of human IAP genes. Gene Name Number of LTRs LTR type 3 MER21C Naip (Birc1) LTR9 LTR16D 3 MER39 cIAP1 (Birc2) LTR55 MER31B 1 LTR7 cIAP2 (Birc3) 2 MER31A/B XIAP (Birc4) MLT1F 0 Survivin (Birc5) 1 MER31A Bruce (Birc6) 7 MER4D1 ML-IAP (Birc7) LTR48B LTR29 LTR2 LTR26B LTR1 MER41A 2 LTR40a Ts-IAP (Birc8) LTR10E  132  Table A.2. Primers and associated information.  133  APPENDIX B CHAPTER 3 SUPPLEMENTARY FIGURES AND TABLES  134  Figure B.1. Homology of human NAIP copies.  Dot plots were performed to better understand the exon architecture of each NAIP copy. The NAIPfull copy in the 2006 assembly of the human genome (70,298,269-70,360,000) was compared to the genomic sequence underlying the other NAIP copies (as indicated). The coordinates of tested sequences are shown.  Figure B.2. Unequal levels of NAIP 5’ and 3’ transcription. Semi-quantitative RT-PCR was performed at a low cycle number across a panel of human tissues to determine the levels of NAIP 5’ and 3’ transcription. Red arrowheads indicate localization of the primers used in this experiment, and are shown relative to a diagram of NAIPfull, at bottom.  135  Figure B.3. Analysis of NAIPfull transcription.  A) NAIPfull-associated TSS are shown (bent arrows) as previously described: i and ii (Romanish et al., 2007); and iii (Xu et al., 2002). Black boxes indicate exons, and labeled boxes represent LTRs (shaded) and SINEs (speckled). Coloured arrowheads indicate tiled primers used to better understand the TSS associated with NAIPfull transcription in THP1 cells (Xu et al., 2002). B) Tiled-primer experiments in the indicated primary human tissues and cell lines. The primers used are colour-coded with those shown above (A). RT-PCR of NAIP in primary tissues was Southern blotted to increase resolution, using a radio-labeled oligonucleotide specific for a region of exon 1 common to all isoforms.  136  A)  B)  see over for continuation…  137  C)  Figure B.4. Sequence analysis underlying NAIP transcription start sites for the novel NAIPSg (A), NAIPJb (B), and NAIPGUSBP1 (C) regulatory regions (see page 137). cDNA sequence is shown in capitalized letters and the underlying genomic DNA (gDNA) is shown in lower case. Subscript numbers associated with green (Alu) or purple (L1) font in the gDNA track denote positions along the relevant transposable element. All discovered transcription start sites are indicated in black bold-face, and superscript numbers in B and C represent the number of clones arising from the particular position. Vertical dashed lines in A, B, and C represent exon junctions, and slight extension of gDNA underlying exon junctions indicates the appropriate splice donor and acceptor sites. Splicing of NAIPJb clones does not occur and transcription proceeds through intervening intron 9 into exon10. Red bold-faced letters in A and B indicate sites of RNA-editing. Potential regulatory motifs are shown relative to the lower case genomic DNA sequences as follow: TATA box – italics; Initiator sequences – overlines; Downstream promoter elements – underlines (Butler and Kadonaga, 2002); yellow, light blue, and dark blue shading denote estrogen response element, retinoic acid response element, and AP-1 binding motifs, respectively (Shankar et al., 2004).  138  Figure B.5. Broad transcription of novel NAIP isoforms. RT-PCR was performed to determine the breadth of expression of NAIP from the Alu and GUSBP1 3’ UTRcontained TSS, represented by bent arrows. Colour-coded arrows indicate the primers used: expression from NAIPSg is indicated by blue arrows and box; expression from NAIPGUSBP1 is indicated by purple arrows and box; and expression from NAIPJb is indicated by orange arrows and box. No splicing is observed between the AluJb transcription start site and the adjacent downstream exon; +/- RT controls indicate low, or no, contamination of genomic DNA. Diagrams are not drawn to scale.  139  see over for continuation…  140  Figure B.6. NAIP protein sequence and encoded domains.  The protein sequence of NAIPfull is shown, and exon boundaries are indicated by numbers above circled arrows. Potential downstream in-frame initiation codons are indicated in red font, and the surrounding nucleotide sequence is shown beneath, with ‘atg’ in boldface. Underlines represent start codons with a sequence context in general agreement with derived consensi (Kozak, 1987; Nakagawa et al., 2008). The stop codon is denoted by an asterisk. Yellow, purple, and green highlighting indicates BIR, NBD, and LRR domains, respectively.  141  Table B.1. Primers used in this report.  142  Table B.1. Primers used in this report.  143  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0067520/manifest

Comment

Related Items