UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Comparative studies of X inactivation within Eutheria Yen, Ziny 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_2005-0711.pdf [ 19.44MB ]
JSON: 831-1.0092268.json
JSON-LD: 831-1.0092268-ld.json
RDF/XML (Pretty): 831-1.0092268-rdf.xml
RDF/JSON: 831-1.0092268-rdf.json
Turtle: 831-1.0092268-turtle.txt
N-Triples: 831-1.0092268-rdf-ntriples.txt
Original Record: 831-1.0092268-source.json
Full Text

Full Text

Comparative Studies of X Inactivation within Eutheria by Z I N Y Y E N B . S c , University of Br i t i sh Columbia , 2003 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S M E D I C A L G E N E T I C S T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A August 2005 © Z iny Y e n , 2005 Abstract X chromosome inactivation has not been well studied in mammals other than humans and mice. In both species, the inactive X expresses the XIST/Xist (X-inactivation specific transcript) non-coding R N A that is crucial for dosage compensation in females. Although both species belong to the same mammalian subclass, Eutheria, they show significant differences in imprinting patterns, negative regulation of XIST/Xist, and extent of silencing on the inactive X chromosome. Furthermore, the mechanism by which the Xis t transcript coats and silences the X in cis is unknown. This study focuses on X-inactivation in other eutherians, first to unravel domains within XIST/Xist o f biological significance, and second to investigate whether incomplete silencing in humans is unique within the mammalian subclass. Comparative analysis to predict conserved secondary structures between seven eutherian orthologs revealed common stems in the sequence before the Xist A repeat, the A repeat, F repeat, and exon 4. Several complex secondary structures were also similar between rodents but were not conserved in other species. These included the D repeat; structures between the B and D , as well as A and F repeats; and the unique rodent exon 5. The significance of these conserved domains in the context of potential biological functions, and how the structural differences might account for some species-specific differences, is discussed in this thesis. To investigate the species variability in the extent of silencing, methylation analysis was performed on Zfic, JaridlC, Crsp2, Utx, Ubel, Ar, and Fmrl in the cow and coast mole, in addition to human and mouse. Results from this study suggest that mouse is distinct in its more complete inactivation at several loci - Zjx, Crsp2 - on the evolutionary newer part of the X , and Ubel on an evolutionary older part of the chromosome. In addition to evolutionary age, factors such as the position o f the centromere, distance from the X inactivation centre (XIC), and presence of Y homologs failed to consistently explain or predict whether the genes on the X chromosome would escape or be subject to inactivation. Further epigenetic analysis is necessary to understand the distinct mechanisms leading to escape versus inactivation amongst different mammals. i i Table of Contents Abstract i i List of Tables v i List of Figures v i i List of Abbreviations v i i i Acknowledgements ix Chapter I Introduction to Dosage Compensation and Xist in Mammals 1 1.1) Mammalian Phylogenetic Tree: Metatheria, Prototheria, Eutheria 1 1.2) Dosage Compensation in Different Species 4 1.3) The X-Inactivation Process 6 1.3.1) Initial X Inactivation 6 1.3.2) Establishment of Stable Silencing 8 1.4) Lyon Repeat Hypothesis 8 1.5) The XIC and Elements Involved in Count and Choice 11 1.6) The Xist Sequence 12 1.7) Differences Between Species 15 Chapter II Sequence in Coast Mole Xist and Comparative Analysis 17 2.1) Introduction 17 2.2) Choice of Data Set and Bioinformatic Tools for this Study 18 2.2.1. ) Xist Regions and Species Data Set 18 2.2.2. ) Bioinformatics Tools 19 2.3) Results 23 2.3.1) Generation of Xist Data Set 23 2.3.2) Notable Differences in Sequence Characteristics in Different Eutherians 24 2.3.3) Choice of Orthologous Segments 31 2.3.4) Findings from Method 1 Analysis (Repeats to Anchor Orthologs) 33 2.3.5) Findings from Method 2 Analysis (Dotplot to Detect Orthology) 43 2.3.6) Control for C A R N A C Output 52 2.4) Discussion 56 Chapter III Comparative Survey of Inactivation Status in Multiple Eutherians 64 3.1) Introduction to Genes that Escape Inactivation 64 3.1.1) Origin of Mammalian Sex Chromosomes 64 3.1.2) Genes that Escape X Inactivation 67 Escape as a Consequence o f Sex Chromosome Evolution 67 Human vs. Mouse: To Escape or not Escape? 69 i i i Other Considerations of Escape 69 3.1.3) Differences Between Species : Imprinting, Methylation, and Escape 71 3.2) Introduction to Methylation Analysis 73 3.2.1) Generation Time 73 3.2.2) Constitutive Heterochromatin 74 3.2.3) Distance from the XIC 75 3.2.4) Evolutionary Age and X / Y Divergence 76 3.3) Results 76 3.4) Discussion 83 3.4.1) Evidence of X-L inked L o c i in Cow and Mole 83 3.4.2) Implications of Factors in Escape 87 Generation Time 87 Constitutive Heterochromatin 87 Distance from the XIC 88 Evolutionary Age o f the Region 89 Presence of Y H o m o l o g s 90 Zfr 90 Crsp2 90 Ubelx 91 Utx 92 JaridlC 92 Ar and Fmrl 93 3.4.3) Summary of Factors in Escape 94 Chapter IV General Conclusion 96 Chapter V Material and Methods 101 5.1) Polymerase Chain Reaction (PCR) 101 5.2) Cloning 101 5.3) Restriction Digest (Pst\) Cloning 102 5.4) 5' and 3' R A C E 103 5.5) Gel Extraction and D N A Purification 104 5.6) N C B I and B C M Search Launcher 105 5.7) Nucleic A c i d Dotplots 105 5.8) Tandem Repeat Finder 105 5.9) C A R N A C 106 5.10) R N A l i f o l d 106 5.11) Mfo ld 106 5.12) R Statistical Package 107 5.13) Tissue Culture 107 5.14) R N A Extraction 108 5.15) Reverse Transcription 109 5.16) D N A Extraction 109 5.17) UCSC-Degenerate Primers for C p G Islands 110 5.18) Methylation Analysis 110 iv References 113 Appendix 121 Figures A. 1) Coast Mole Extended Exon 4 in Xist 121 A.2) Overview of Percent Identity Plots of Eutherian Xist Sequences 122 A.3) Percent Identity Plots of Human and Mouse XIST/Xist Genomic Sequences 123 A.4) Multiple Alignments of Xist Repeats A and F 124 A.5) RNAlifold of the Xist 5' end 125 A.6) Partial Conservation of Exon 2 in Rodents 126 A.7) Rodents Xist Exon 6 Partially Conserved and Consensus Structures 127 A.8) Xist Exon 5 CARNAC Results Without Rodents 128 A.9) Rodent versus Non-Rodent Combined Internal Exon Region 129 Tables A.1) Multiple Species Xist Splice Junctions 13 0 A.2) Primers Used for Sequencing Coast Mole Xist 131 A.3) Pairwise Identities of Exons in Eutherians 134 A.4) Pairwise Identities for Xist Sequences Before the A Repeat 137 v List of Tables Table 2.1. Xist Repeats in Multiple Eutherians 26 Table 2.2. Summary of Randomized Vs. Original Xist C A R N A C Results 57 Table 3.1. Degenerate Primers Used for Methylation Analysis at CpG Islands of 78 X-linked Genes vi L i s t o f F i g u r e s 1.1) Mammalian Phylogenetic Tree 2 1.2) X Inactivation in Mouse Female Mammalian Embryonic Development 7 1.3) Silencing of the X Chromosome after Blastulation in the Mouse 9 1.4) Timeline of events in Undifferentiated and Differentiated Embryonic Stem Cells 10 1.5) Critical Regions of XIST/Xist in Human and Mouse 14 2.1) The Theory Behind the CARNAC Algorithm 22 2.2) Coast Mole Xist cDNA Sequence 25 2.3) Summary of Tandem Repeats within Xist in Multiple Eutherians 28 2.4) Eutherian .Als/cDNADotplots 29 2.5) Eutherian Xist cDNA Dotplots against Human Xist cDNA 30 2.6) Summary Diagram of Exon and Intron Structures of Xist Orthologs 32 2.7) Method 1 Analysis Diagram 34 2.8) Method 2 Breakdown of Xist Orthologs 35 2.9) Xist Sequence Before A Repeat 36 2.10) CARNAC and RNAlifold Structures for the Xist A Repeat 37 2.11) Xist Repeat F 38 2.12) The Region Between the A and F Repeat for Rodents 40 2.13) Rodent versus Non-Rodent, Region between A and F Repeats 41 2.14) D Repeat in Rodents 42 2.15) Summary of Method 1 CARNAC Results 44 2.16) Method 2, Segment 1 CARNAC Results 45 2.17) Method 2, Segment 4 of Xist CARNAC Results for Non-Rodents 46 2.18) Method 2, Segment 4 of Xist CARNAC Results for Rodents 47 2.19) Method 2, Segment 6 of Xist CARNAC Results 48 2.20) Method 2, Segment 7 and 8 of Xist CARNAC Results 49 2.21) Summary of Method 2 CARNAC Results 50 2.22) Xist Exon 4 CARNAC Results 51 2.23) Xist Rodent Unique Exon 5 CARNAC Results 53 2.24) Consensus Structure and Multiple Alignment of Exon 4 54 2.25) Summary of CARNAC Results in Exonic Portions 55 2.26) Proposed Functions of Xist Regions 59 3.1) Evolution of Mammalian Sex Chromosomes 65 3.2) Extent of Genes that Escape Inactivation in Human versus Mouse 68 3.3) Differences Between Mammals 72 3.4) Control PCRs for Methylation Analysis 80 3.5) Methylation Analysis of X-Linked Loci in Four Eutherians 81 3.6) Summary Diagram of Methylation Analysis Results 82 3.7) Potential Developmentally-Dependent Expression status of JaridlC in Cow 85 3.8) Comparison of Eutherian X and Y Chromosomes 87 5.1) Methylation Analysis 112 vii List of Abbreviations CARNAC: Computer Alignment of RNA by Cofolding CIP: Calf Intestinal Phosphatase Cot-1: Concentration over Time-1 DCC: Dosage Compensation Complex DEPC: Diethyl Pyrocarbonate DMSO: Dimethyl Sulfoxide DTT: Dithiothreitol E (eg: E6.5): Embryonic Day 6.5 ES: Embryonic Stem Cell FCS: Fetal Calf Serum FISH: Fluorescent in-situ Hybridization HMG: High Mobility Group K: Lysine LPCR: Long PCR MEM: Minimal Essential Medium MFE: Minimum Free Energy MIS: Mullerian Inhibiting Substance M-MLV: Moloney Murine Leukemia Virus MSCI: Meiotic Sex Chromosome Inactivation Mya: Million Years Ago NEAA: Non-Essential Amino Acids PAR: Pseudoautosomal Region PCR: Polymerase Chain Reaction PNA: Peptide Nucleic Acid RACE: Rapid Amplification of cDNA Ends RT: Reverse Transcription SAP: Shrimp Alkaline Phosphatase SDS: Sodium Dodecyl Sulfite TAP: Tobacco Acid Pyrophosphatase X:A: X Chromosome to Autosome Ratio Xa: Active X Chromosome/ Xi: Inactive X Chromosome Xm: Maternal X Chromosome/ Xp: Paternal X Chromosome XIST/Xist: X Inactive Specific Transcript XIC: X Inactivation Centre XCI: X Chromosome Inactivation XAR: X Added Region XCR: X Conserved Region XCE: X Controlling Element XITE: X Inactivation Intergenic Transcribed Element Gene name capitalized: Human locus. Eg: XIST Gene name lower case: Locus in non-human species. Eg: Xist Non-italicized gene name: product from the gene. Eg: Human XIST transcript UTR: Untranslated Region viii Acknowledgements I would like to thank Dr. Carolyn Brown, my supervisor and my mentor, for all her inspiration and encouragement throughout my Master's program. I would like to thank all members of the Brown lab for their motivation, guidance, and support, which has made my experience at U B C all the more worthwhile. I would like to express my gratitude to members of the Lefebvre and Robinson lab for the informative lab meetings and enjoyable social gatherings we have shared together. Members of my committee, Dr. Louis Lefebvre, Dr. Evica Rajcan-Separovic, Dr. Sally Otto, and Dr. Wyeth Wasserman are greatly appreciated for their continuous constructive criticism and detailed review of my thesis. I am grateful to have had Sohrab Shah assist me with the randomization procedure and Tracy Tucker help me with my statistical analyses. Lastly, I would like to thank Sanja Karalic, whose past work in the lab has made this project possible. Chapter I Introduction to Dosage Compensation and Xist in Mammals 1.1) Mammalian Phvlogenetic Tree: Metatheria, Prototheria, Eutheria A l l mammals belong to the phylum Chordata in the kingdom Animalia, which includes all animals with backbones, known as vertebrates. Chordates have been categorized into classes based on anatomical differences. Fish, our swimming vertebrate ancestors, emerged from Vertebrata at least 450-530 mill ion years ago (mya), as approximated by the age of the oldest fish fossil found (Figure 1.1) (reviewed in [1]). The first amphibian, the cross-over between solely water-living and solely land-living vertebrates, arose around 375 mya [1]. Amphibians include frogs, toads, and salamanders, which are characterized by their smooth skin, cold-blood, and ability to dwell in water or land. On the other hand, land-living reptiles radiated approximately 350 mya, and include the members lizard, snake, turtle and crocodile [1]. These animals are also cold-blooded and reproduce by either laying soft-shelled eggs or bearing live young. In contrast to mammals and birds (below), which are endothermic (body temperature regulated and maintained internally), reptiles, amphibians, and fish are ectothermic (body temperature primarily regulated by ambient temperatures). Flying animals in the class ofAves evolved approximately 310 mya [1]. Unlike the vertebrates mentioned above, these birds are warm-blooded, have hollow bones and feathers and are born from hard-shelled eggs. Mammals, the fifth class in Chordata, are also warm-blooded, but they are distinguished by hair/fur, milk production, and the birth of live young [1]. Within the class Mammalia, two subclasses exist, known as Theria and Prototheria. The prototherians (egg-laying mammals), or monotremes, are the most early branching mammals, diverging from the therians about 200 mya (reviewed in [2]). The name monotreme derives from the term "one-holed," referring to the single hole that acts as the urinary tract, anus, and reproductive tract in these mammals. Three species of monotremes exist, currently restricted to Australia and New Guinea: the duck-billed platypus (Ornithorhynchus) and the two spiny anteater species, or echidnas (Tachyglossus and Zaglossus) (reviewed in [2]). Monotremes exhibit both ancestral reptilian and mammalian traits, as they reproduce by laying 1 B K Human JhiniKiiviji) Goote Babocfi Macaque , ' . : ( . • • M ; i ! h ! ; , Mouse Lemur Lemui R*MX |— Hi Cat Oog Birds 31 Omva Metatherians/ Marsupials 185mva O H - Amiadllo SA Opossum NA Opoiwm Chdwn Fugu Fish . Amphibians. Reptiles 530-350mya Prototherians/ Monotremes 20()mya Mouse Rat Hedgehog Human r u l r i i «) , i Zefcrafeh Figure 1.1. Mammalian Phylogenetic Tree. A) Mammalian relationships with adjusted branch lengths, as adapted from [3]. B) Simplified version with the species of interest in this study depicted in unrooted cladogram format, based on substitutions per synonymous site relative to human rather than phylogenies within Eutheria [3]. The mole belongs to the same order Insectivora as the hedgehog, and the vole belongs to the same order Rodentia, as the rat and mouse. tiny eggs with a leathery shell (less than 2 cm long), but nourish their offspring with milk produced from a gland on their belly. Like other mammals, monotremes possess hair, but have a low metabolic rate and a temperature slightly below that of therians (placental mammals). The internal temperature of a platypus is 30°C, compared to the constant body temperature of 35-39°C in other mammals [2]. The therians, on the other hand, are further subdivided into two infraclasses: the metatherians and eutherians. The metatherians (pouched mammals), or marsupials, diverged from prototherians 130-185 mya [2, 3]. They have short gestational times relative to eutherians, due to a yolk-like placenta in the mother, and give birth approximately a month after fertilization. After birth, the marsupial embryo continues to develop, for weeks or months depending on the species, in the mother's pouch, as apparent in the kangaroo or opossum [2, 3]. Unlike metatherians, the eutherians (wrongly coined "placentals" since all therians possess placentae) possess an allantoic placenta and exhibit long gestational times, nourishing the developing embryo with the mother's blood supply in utero. Hence, they give birth to wel l -developed young who generally feed on their mother's milk for further growth and development. Eutheria is divided into many orders, including Insectivora, Rodentia, Carnivora, Artiodactyla, Primates, as exemplified by the mole, mouse and rat, dog, cow, and human. Their relative positions on the phylogenetic tree, based on genetic data, are shown on Figure 1.1 [3]. The phylogeny among the mammalian orders, dating back to at least 60 to 100 mya, has not been easily resolved, due to a short period of time when mammalian radiation occurred. Contrasting groupings have been derived from morphological and molecular data [4], and incongruent branching orders have been observed between molecular trees generated from different inference methods. Reconstruction of evolutionary history relies on the identification of homologous characters shared between different organisms. Unt i l the 1970s, this was essentially restricted to morphological or anatomical characteristics. The comparison of fossils and extant species has proved powerful to some extent, in that it has categorized the major groups of animals and plants. However, this approach is hampered by the limited number of reliable homologous characters. The availability of molecular data has increased the number of characters, thereby improving the resolving power for inferring phylogenetic relationships. However, conflicts arise between results obtained by molecular and morphological data, due to 3 sampling bias and the limited data availability. Furthermore, different trees have been generated from mitochondrial (protein-coding or R N A ) versus nuclear sequences, due to differences in mutation rates, inheritance patterns of these sequences, and sampling bias of different data sets [5]. The resulting branching patterns have also depended on whether parsimony, distance, or likelihood methods were used. Fortunately, the large amount of sequence information that has emerged in the last decade has allowed for a greater number of genes and species to be represented, as well as a combination o f approaches where both mitochondrial and nuclear data are included. Phylogenomic data and inference methods such as supermatrix and supertrees have become powerful to accurately establish the branching orders within Mammalia [6]. Figure 1.1 A shows a recent tree using marsupials as an outgroup and phylogenies based on results from two data sets: 16.4 kb o f concatenated nuclear and mitochondria exonic data [4] using a supermatrix method, and over 12Mb of genomic sequence based on rare genomic changes such as transpositions, insertions, and deletions [7]. Furthermore, the trees generated from these two data sets were cross-validated using more than one inference approaches and received high bootstrap support for the resulting branches. I consider Figure 1.1A to be a reliable tree (perhaps the most accurate to date), as the data is both rich in representative species/sequence information and relied on multiple phylogenomic inference methods [3]. 1.2) Dosage Compensation in Different Species Invertebrates utilize a variety of mechanisms for dosage compensation. C. elegans, for example, accounts for the genetic imbalance between X X hermaphrodites and X O males by downregulating both X chromosomes in the hermaphrodite (reviewed in [8, 9]). A multi-protein complex that assembles on both X chromosomes presumably alters chromatin structure to reduce transcriptional activity. The X chromosomes are apparently distinguished from the autosomes by a 793 bp element which acts as a recruiting site on the X chromosome, allowing the complex to first bind and then establish silencing of neighboring regions [10]. Drosophila melanogaster (fruit-fly), on the other hand, hypertranscribes the single X in X Y males to achieve equivalency with the X X gene dosage in females. The upregulation requires the dosage compensation complex (DCC) , including two non-coding R N A s - r o X l and rox2- involved in chromosomal targeting and a variety o f proteins that remodel chromatin, including a histone H4 acetyltransferase, and a histone kinase (reviewed in [11]). 4 Within vertebrates, it has been hypothesized that birds dosage compensate for the genetic imbalances between males and females by upregulating the single Z in females, achieved by a locus on the W chromosome, which maintains the Z : autosome (Z:A) ratio at 1 [12]. Inactivation o f one Z chromosome for dosage compensation does not occur, as Z-borne genes show expression from both Z chromosomes [13, 14]. The repetitive sequence (MHM, male hypermethylated) on the Z chromosome that is hypomethylated and transcribed in Z W females appears to be up-regulated by a factor on the W chromosome [12]. The non-coding female-specific M H M transcript accumulates at the MHM locus on the Z chromosome and likely represses the nearby Dmrtl [12]. Thus, Dmrtl escapes expression (which would have led to male sex determination) of the otherwise upregulated Z chromosome in females. A female-specific histone H4 lysine 16 (H4K16) acetylation is enriched on the Z chromosome in the region of the MHM locus, which is the same histone modification enriched on the up-regulated Drosophila melanogaster male X chromosome that plays a vital role in the dosage compensation [15]. Hence, it is convincing that MHM is important for both sex determination (by repressing Dmrtl in females) and dosage compensation (because it is W-dependent and up-regulates the single Z chromosome only in females). In marsupials, X chromosome inactivation occurs as the means of dosage compensation. A l l female mammals, including prototherians, display replication asynchrony between their X homologs. The inactive X replicates later than the active X chromosome, and this feature is a hallmark of silent chromatin [2]. Asynchronous replication timing in platypus and echidna occurs on part of the X chromosome in lymphocytes, but not fibroblasts, suggesting that inactivation is partial and tissue-specific [2]. Kangaroo females show late replication timing as well , but only one allele is expressed in heterozygous individuals for X- l inked traits, revealing that inactivation affects only the paternal allele in marsupials [16]. The silencing is not only imprinted but appears to be unstable, for loci on the inactive opossum X are readily reactivated in culture [17]. In addition, no Xist homolog (see below) has been found in metatherians or monotremes to date [18]. Eutherian mammals silence one o f the X s per diploid set in females via a mechanism mediated by the Xist non-coding R N A . This transcript is an over 17 kb alternatively spliced, polyadenylated R N A (over 15 kb in mouse) that coats the inactive X from which it is exclusively expressed to trigger the events of silencing. Xis t is necessary and sufficient to 5 initiate the silencing cascade in cis [19,20]. In humans, inactivation is random in all tissues, compared to cattle and mice, which show paternal inactivation in extraembryonic tissues and random inactivation elsewhere. The next section discusses the eutherian X-inactivation process as it occurs in mouse because most early embryo experiments have been conducted in this species. 1.3) The X-inactivation Process 1.3.1) Initial X Inactivation In mouse, inactivation occurs exclusively on the paternal X in the trophectoderm, whereas random inactivation occurs in the majority of the inner cell mass, with the exception of the primitive endoderm. The mouse paternal X chromosome is inactivated very early, at either the two- or four-cell stage after fertilization, as supported by Xis t coating coupled with lack of Cot-1 staining (symbolic of nascent transcription), as well as absence of elongating R N A polymerase II at the inactive X , respectively (Figure 1.2) [21, 22]. The erasure of imprints must occur in the epiblast after blastulation, so that random inactivation can occur in the embryo proper. Consistent with this, Xis t coating is lost at the blastocyst stage [22]. It is unclear whether the paternal X is inherited in a preinactivated state by retaining features of heterochromatin from the meiotic sex body formation in male spermatogenesis [22-24], or whether both X s are initially active followed by early inactivation of the X bearing the paternal imprint in all cells [25]. Although it is possible that the paternal X retains imprints from meiotic sex chromosome inactivation (MSCI) in the father, there are major differences between the two types o f X inactivation processes. One obvious difference is that M S C I is Xist-independent (although Xis t is present), as mice with deleted Xist loci are still fertile and able to form X - Y bodies during pachytene [26, 27]. A second difference is that the inactive X s in each case carry different epigenetic marks: the H4 histones of the X involved in M S C I are not hypoacetylated [28] and a different histone H2 variant, the phosphorylated H 2 A X , accumulates on the sex chromatin [29]. The role of M S C I seems drastically opposite to that of somatic X inactivation. In the former case, transient silencing of the X results in a large dosage imbalance between females and males. M S C I may serve to silence genes detrimental to spermatogenesis or to allow for more efficient synapsis in males for proper segregation during gametogenesis [30]. 6 female zygote o 2 cell 4 cell 8 cell 16 cell morula •Paternal X partially inactive •Lack of late replication timing •Partial Xist coating •Cot-1 negative •Distance-dependent silencing (biallelic expression of loci distant to XIQ •No R N A pol II association •H3K4 hypomethylation and H3K9 hypoacetylation on inactive Xp •H3K9 and K27 methylation on inactive Xp •MacroH2A recruitment 32-cell blastocyst Xp X m •H3K9 dimethylation on inactive Xp Figure 1.2. X Inactivation in Mouse Female Mammalian Embryonic Development. The two leftmost columns describe the stage o f embryonic development, while the two rightmost columns depict and describe X inactivation associated events at the chromosomal level, respectively [25, 32-37]. 1.3.2) Establishment of Stable Silencing The paternal X in the mouse early embryo is only partially inactivated. With fluorescent in situ hybridization (FISH), the mouse paternal X shows only a small signal of Xis t localization, indicating partial Xis t coating. Monoallellic expression of X- l inked genes occurs at loci close to the Xie but biallelic expression is still seen at loci situated at a greater distance [25]. Histone H 3 K 4 (lysine 4) hypomethylation and H 3 K 9 hypoacetylation occur at the 8-cell embryo stage, followed by H 3 K 9 methylation by G9a [31] and K 2 7 methylation by Eed/Ezh2 (mPRC2), thought to be important for establishing chromosomal memory o f the inactive state [32] . MacroH2A recruitment is a late event, occurring in the 16-cell morula, followed by H 3 K 9 dimethylation in the 32-cell blastocyst (Figure 1.2) [25]. A t the blastocyst stage, cells of the mouse trophectoderm and primitive endoderm show stabilized paternal X inactivation, marked by complete Xis t coating, and monoallelic expression of loci independent of distance from the Xie (Figure 1.3). On the other hand, reactivation ofthe paternal X occurs in the inner cell mass of the implanted blastocyst accompanied by loss of Xis t coating and associated proteins, and the mechanisms of count and choice occur to establish random inactivation (Section 1.5 below). In undifferentiated embryonic stem (ES) cells, silencing is reversible and dependent on continuous Xist expression, possibly because it lacks hallmarks such as histone hypoacetylation and/or D N A methylation, which might be important for maintenance (Figure 1.4) [33]. X inactivation in undifferentiated stem cells is not associated with late replication of the inactive X . Upon the onset of differentiation, Xis t coating occurs followed by histone H3 hypoacetylation at K 9 and K27 , H 3 K 4 hypomethylation, H3K9/K20 /K27 hypermethylation and H 2 A ubiquitinylation by the polycomb protein R i n g l b [32, 34-38]. Thus, methyltransferases, acetylases, and polycomb proteins accompany Xis t in remodeling the chromatin to establish silencing. This inactive chromatin is maintained synergistically by later modifications including H4 hypoacetylation, macroH2A recruitment, and D N A methylation [39, 40]. After two days of differentiation, silencing becomes irreversible, and subsequent expression of Xist fails to induce silencing in mice [41]. 1.4) Lyon Repeat Hypothesis Because L I N E - I s (long-interspersed nuclear elements) comprise 30% of the X chromosome (two times the average abundance in the genome), they are postulated to serve as 8 blastocyst •Loss of Xist and proteins on Xp •ERASURE of imprints Xp Xm 1 Xm Trophectoderm (and primitive endoderm) •Stabilized silencing on inactive Xp •NON-RANDOM inactivation Epiblast and Embryonic Stem Cells Xp •Count and Choice mechanisms RANDOM INACTIVATION O R Xm Xp Xm •Xist coats future inactive X, BRCA1-dependent Figure 1.3. Silencing of the X chromosome after Blastulation in the Mouse. The establishment o f random inactivation in the embryo proper and paternal inactivation in the extraembryonic tissues in the mouse [25, 65]. Undifferentiated ES cells •Silencing is Xist dependent and reversible •No histone hypoacetylation or late replication on X i Upon induction of differentiation: Xist coating 2-days post-differentiated ES cells •Silencing is X i s t independent and irreversible MAINTENANCE H4 hypoacetylation macroH2A recruitment DNA methylation H3K27 and K9 hypoacetylation H3K4 hypomethylation H3K9, K20, K27 hypermethylation H2A ubiquitination by Ringlb Xp I Xm or Xp Figure 1.4. Timeline of events in Undifferentiated and Differentiated Embryonic Stem Cells [31-41]. A ) Silencing characteristics in undifferentiated and differentiated stem cells in mouse. Lis t o f events related to X inactivation in chronological order after differentiation o f embryonic stem cells. The green arrow signifies the point at which silencing becomes irreversible. "way-stations" or "booster" elements to help propagate the silencing signal on the X [42,43]. In support of this hypothesis, L I N E - I s are found at a higher density within 20 kb of random monoallelically expressed loci compared to biallelically expressed loci [44]. L I elements are also enriched on the long arm of the X , where most genes are subject to inactivation [45]. This enrichment is observed across three eutherian clades [46]. However, this correlation could simply be a remnant of evolution rather than reflect an active role in enhancing the Xis t signal along the X chromosome. The mechanism by which LINE-1 works is unknown, but intrachromosomal pairing of L i s possibly amplifies the inactivation signal [43]. Although most L I elements are repressed in the genome via methylation at their promoter, L i s are likely hypomethylated prior to X-inactivation, such that intrachromosomal pairing could occur. This has been supported by the observation that L I methylation of the inactive X does not occur until after L I methylation on the active X , being mediated by a distinct methyltransferase [47]. 1.5 ) The XIC and Elements Involved in Count and Choice In mammals, one X is kept active per diploid autosome set and all other X s present in the same cell are inactivated. This "counting" phenomenon is clearly demonstrated in tetraploids which retain two active X chromosomes and triploids which carry either one or two active X s . Females trisomic for the X chromosome, on the other hand, retain only one active X , suggestive of autosomal involvement (reviewed in [48]). H o w exactly the cell counts the number o f X s is unknown, but a 20 kb bipartite domain downstream of Xist (see below) appears to be involved [49, 50]. After determining the number of X s to remain active, there is a choice between which X homolog per diploid set should be inactivated. This decision is stably inherited from parent to progeny. The mechanism of "choice" is complex and involves multiple elements of the Xie, including a roughly defined X controlling element (Xce) 3 ' o f Xist that leads to complete non-random inactivation when deleted [51, 52], and Tsix, located 12 kb downstream of Xist that negatively regulates the Xis t R N A , both by its overlapping transcription and antisense sequence [53]. Absence of the Tsix promoter leads to preferential inactivation of the mutant X allele, whereas upregulation of Tsix expression leads to the opposite outcome [54-56]. Deletions of Xite, which lies upstream of Tsix, reveal that it is a positive regulator of Tsix, suggesting that this locus may be equivalent to the previously described Xce [57]. 11 Once the count and choice processes are established, the inactivation mechanism itself requires the X inactivation centre (Xie) on the X chromosome, a region originally mapped to a 1Mb region (Xq l3 ) by human X;autosome translocations and refined to a 450 kb region by mouse transgenic studies. The Xie encodes for a number of non-coding R N A s , including Xist . 1.6) The Xist Sequence Xist is necessary and sufficient to initiate silencing of the X chromosome in cis [19, 20]. In addition to the X chromosomes, autosomes are capable of being silenced, though less completely and stably, as evidenced by unbalanced X;autosome translocations leading to apparently normal phenotypes [58]. The same autosomal segment can either escape or be inactivated, and the spread of inactivation can be discontinuous or continuous depending on the rearrangement [59]. The chromatin environment changes the outcome of the inactivation process and the Xist R N A must somehow be functionally constrained to work most effectively in its original context. The domains that are important for recognizing the X chromosome in cis, effectively propagating the silencing signal, and recruiting chromatin remodelers to the future inactive X , are largely unknown. Because structure is closely connected to function, some attempts have been made to predict structures of biological significance within Xist using comparative sequence analysis. Three-way comparisons involving mouse-human-vole or mouse-human-cow XISTIXist sequences have in general revealed low primary sequence conservation [19, 60, 61]. The XISTIXist sequence conservation between human and mouse is 60-70% in gap free regions, although some regions show up to 80% identity [19,20]. Notably, there are many regions that are unique to either of the species. Although potential open reading frames (ORFs) were detected in the mouse and human sequences, they were less than 600 bp (483 bp in human and 576 bp in mouse), a length likely to be observed by chance alone [19, 62]. The most highly conserved blocks were located at the 5' end of the gene around transcript positions 250-800 bp in both species, consisting of 43-59 bp tandem repeats separated by A T - r i c h spacers (coined the " A " repeat), as well as a C-rich repeat 1 kb downstream (coined the " B " repeat) [19]. XIST/Xist has up to six sets of tandem repeats, comprising about 50% of the entire processed transcript [20, 63]. 12 Surprisingly, the global sequence similarity between mouse and vole was found to be only 57%, although both rodents branched into different families just 15-25 mya [63]. This estimate was not much higher than the conservation between human and vole, found to be 49% overall. Comparison of exonic regions between cow, human, and mouse further confirmed that the sequence similarity between XIST/Xist orthologs was lower than would be expected for protein coding regions (previously established to be 85% by comparing nearly 2000 unique rodent versus human m R N A s ) [60]. The study detected 66% sequence identity between human and mouse and 62% between mouse and cow. These values are even lower than the expected sequence conservations of 5' (67-79% identity) and 3' U T R s (69-74% identity) between rodent and human m R N A s [60]. Based on these analyses, there appears to be low evolutionary pressure to maintain the primary gene sequence of Xist, perhaps because the precise D N A sequence is less important than the secondary structure that the R N A adopts for functioning in X inactivation. Despite poor global D N A similarity, there are several regions o f high local similarity within exon 1 and exon 6 between the human, mouse and the four vole species. These include the six tandem repeat elements, designated A - F [19, 20, 63]. Notably, there are differential insertions and truncations, specific to each species. In human, the B repeat is interrupted by an insertion such that it is split in comparison to the rodents [63]. In rodents, the C repeat (115 bp per monomer) has been truncated in repeat length but greatly expanded in copy number, relative to the single copy in human [19, 63]. The D repeat (290 bp per monomer), which spans over 3 kb in mouse and voles, is further expanded in human [19, 20, 63]. The last E repeat (T-rich), which is situated within exon 6, seems to be variable in the vole, human, and mouse sequences, but always occurs within the 5' end of the second biggest exon [48]. The repetitive elements of Xist have been speculated to form secondary structures important for binding D N A or proteins necessary for dosage compensation [19]. Transgenic studies using deleted variants o f Xist c D N A introduced into mouse E S cells have revealed that Xis t localization involves the cooperation of the redundant repeat sequences, A , B , D , E , F and parts of C . Silencing involves the A (5' conserved) repeat (Figure 1.5) [64]. The copy number of this repeat is important, as a minimum of 5.5 copies is necessary to achieve inactivation of genes along the mouse X chromosome [64]. A threshold number of repeat copies might be required to form structures important for recruiting proteins that establish silent chromatin. On 13 1 0J5 3 H 2 s E = V B —: • — CB e •_ 5 — — .2 — a u r ° O D O ° z o c o 1 2 E © .2 S e <k © Jg S o -S *• a § d 3 u J 0 ° o o o j « I I I I o o o o 3" 3 es -a S i 1 « E B - 1) — ja s -S5 "O B " C 2 •* I ss S u es c 3 3 •-e s w 2 e H W J g | © g Figure 1.5. Critical Regions of XIST/Xist in Human and Mouse. Functional regions o f XIST/Xist established by deletion studies or P N A interference mapping in human or mouse [64, 69, 87; personal communication, Jennifer Chow] . Repeats A - D are within exon 1 of XIST/Xist and repeat E is within exon 6 o f human XIST or exon 7 o f mouse Xist. 14 the other hand, macroH2 A recruitment requires both repeats D and E , but silencing via repeat A can occur in the absence of D and E , confirming that macroH2A recruitment is a late event that is not essential for initial inactivation [64]. MacroH2A is not the only protein that is recruited to the inactive X . Other domains of Xist must interact with key players such as B R C A 1 , the only protein thus far shown to be necessary for the localization of Xis t [65]. G9a [31], Eed/Ezh2 [36, 38, 66 ] , R i n g l b [34, 67], R N A polymerase II, histone/DNA modifiers or poly comb group proteins [37], are other candidate interacting proteins. Way stations such as L i s or L T R s that play a role in enhancing the silencing signal of Xis t might also bind to the R N A . In addition, it has been recently demonstrated that Xis t interacts with the S A F - A scaffold protein to constrain the R N A within the nucleus near the X chromosome [68]. Finally, the stability o f Xis t could result from R N A structures that prevent degradation. Because secondary structures may be important for the function of Xis t despite poor D N A sequence conservation, comparative sequence analysis with a greater number of eutherian species w i l l shed insight into R N A structures that are common and indicative of significant biological functions. Xis t R N A structures have been investigated in human and mouse. Wi th the M-fo ld software, the A repeat was predicted to fold into a stable hairpin 1- hairpin 2 structure in both human and mouse (Figure 1.5) [64]. In addition, human exon 4 (mouse exon 5), which shows better sequence similarity relative to the rest of the R N A , was predicted to fold stably into a long hairpin structure in each species [69]. The biological relevance of this conserved domain is currently unknown, as deletions of this region in Xist did not alter localization nor inactivation ability [69]. However, a decrease in steady state level of the mutant compared to the wild-type R N A was seen, suggestive of an effect on stability or processing of Xis t [69]. 1.7) Differences Between Species Within eutherians, differences arise in the X-inactivation process. The antisense regulation of Xis t is different between mouse and human. Although the Tsix transcript overlaps completely with the Xist locus in mouse, only a truncated form is found in human that does not overlap with XIST completely and lacks exons and promoter regions characteristic of Tsix [70, 71]. In addition, the critical C p G island methylation site o f DXPas34 that is believed to regulate Tsix expression (important for imprinting) is absent in both human and cow, although cow still 15 shows paternal inactivation in extraembryonic tissues [72]. Furthermore, continued expression of TSIX is coincident with XIST expression in human and is found at low levels, whereas mouse Xist is down-regulated on the chromosome that highly expresses Tsix (the future active X ) during early development [73]. In female cow somatic tissues (differentiated), Tsix and Xist are co-expressed in the same cells and maintained in the adult, again arguing against Tsix 's role in antisense regulation in this species [74]. Xist itself is different between eutherians. Variations in sequence and secondary structure between human XIST and mouse Xist might account for the observed differences in binding affinity. The mouse Xis t is retained on metaphase chromosomes, as opposed to human X I S T which is released in prophase [73]. Species-specific factors in X inactivation might explain the inability of human X I S T to localize to the human X in human-mouse somatic cell hybrids [75]. Examining the Xist sequence in multiple eutherians w i l l provide insight into how Xist has evolved in different lineages leading to variations in function and proteins involved in the X-inactivation pathway. Such studies form the basis of Chapter 2. Another large difference between human and mouse is the extent of silencing along the X chromosome. This topic w i l l be addressed in detail in Chapter 3. 16 Chapter II Sequencing Coast Mole Xist and Comparative Analysis 2.1) Introduction Xist must interact with a wide array of /raws-acting players and cz's-elements in order to bring about the inactive state of one X chromosome in mammalian females. The apparently simple outcome of stable X inactivation is accomplished by a complex sequence of events. Important interacting players include chromatin modification and restructuring proteins, regulatory and recognition machinery, and molecules which initially mark the chromosomes for silencing (see Chapter 1). How Xis t might interact with these candidates and how it manages to coat the inactive X in cis is still poorly understood. Genetic studies have demonstrated that silencing is distinct from Xis t coating (Section 1.6), as Xis t localization on its own is insufficient to cause X inactivation [64]. Furthermore, localization seems to require species-specific autosomal factors, since human X I S T cannot coat the human X chromosome in a human-mouse hybrid [75]. Apparently in this case, the human complement of the factors necessary to guide X I S T to the corresponding X are absent and the complete set of mouse chromosomes in the hybrid fails to replace this function. Thus, the autosomal factors that help Xis t to localize might only recognize the chromosomes of their own species. How Xis t might structurally interact with these localization factors is addressed in this study. A s mentioned previously (Section 1.6), localization involves redundant domains of Xist in the mouse model [64], but whether this statement generalizes to other eutherians is unknown. Xist sequence differences between species might lead to altered localization, as the mouse Xis t transcript is able to localize and remain tethered to the mouse inactive X beyond metaphase, whereas human X I S T falls off during this phase o f cell division [73]. Even though the human and mouse transcripts must be dissimilar to account for observable differences in binding affinity, when human Y A C X/STtransgenes are expressed in mouse ES cells, the human X I S T transcript is still able to coat and partially inactivate the mouse autosome from which it is transcribed [76]. This indicates that the transcripts are interchangeable to some extent for localization and silencing. Presumably, there are regions in common between human and mouse XIST/Xis t transcripts to bring about these functions, but 17 these sites have not been clearly delineated and it is unknown whether the same regions are present in other eutherians. The Lyon hypothesis proposes that L I elements, which are enriched on the X chromosome relative to the rest of the genome, might act as "booster-elements" to propagate the spread of the silencing signal along the X [42]. Questions of how Xis t might interact with these ds-elements and what domains are necessary to cause silencing remain to be answered. If the A repeat that is necessary for silencing is what interacts with L I or other important c/s-elements, then this repeat should hypothetically be found in all eutherians. Certainly, as structure is tightly tied to function, especially for non-coding R N A s , obtaining secondary structure information for XIST/Xist would be useful to understand how the transcript works. Unfortunately the large size of XIST/Xist (>15kb in both cases) has made it difficult to determine structure with methods such as N M R , X-ray crystallography, chemical digestions, denaturing temperature differences in paired vs. unpaired regions, and footprinting. The purpose of this study is to compare orthologous Xist sequences from different eutherians in order to find important domains o f the transcript. The global primary sequence conservation, consensus secondary structure prediction from aligned sequences, and common structures found in all species w i l l be useful to identify significant regions. Furthermore, the gene structure o f Xist in terms of exon/intron properties and repeats has been assessed across the multiple species to gain insight into the evolution of the primitive Xis t transcript. Differences between species w i l l help to explain variation of Xis t function within Eutheria. 2.2) Choice of Data Set and Bioinformatic Tools for this Studv 2.2.1.) Xist Regions and Species Data Set Initial comparative sequence analyses of Xist concentrated on the conservation o f D N A sequence, rather than secondary structures. Since Xis t functions as a non-coding R N A , evolutionary pressures have presumably constrained the proper folding of the transcript, rather than primary sequence, to ensure efficient function. In other words, nucleotides can diverge substantially at the primary sequence level, as long as compensatory mutations occur to preserve the R N A structure. Additionally, repetitive regions may fold into similar structures, despite having poor sequence conservation. Since repeats carry a high potential for expansion, these repeats may serve to maintain a certain threshold of size for particular roles of the R N A . Thus, 18 the focus of this study is on conserved folded structures using bioinformatics, with a large emphasis on the repetitiveness of Xist which is highly conserved in the eutherians examined thus far. These analyses have assumed that conservation in diverged species implies functional significance, in order to highlight regions of biological function within the Xis t molecule. The dog, rat, and cow genomes have been sequenced, and their data are available. Additionally, I have sequenced -14.5 kb o f the coast mole Xist using P C R and traditional cloning approaches (Sections 6.1-6.5). Experiments have demonstrated that the processed Xis t transcript is sufficient for coating and silencing. Hence it has been necessary to identify exonic boundaries within the dog, rat, cow, and coast mole sequences. Given that insectivores, carnivores, artiodactyls, rodents and primates are not clustered on the eutherian phylogenetic tree (Section 1.1 and Figure 1.1), the orthologs from the above species (coast mole, dog, cow, rat, mouse, human) have been chosen to give a representative picture o f the Xis t transcript in this mammalian subclass. 2.2.2) Bioinformatics Tools To evaluate primary sequence conservation, ClustalW alignments were produced and pairwise percent identities were calculated per region of Xist, across the seven eutherian orthologs. Mult iPipMaker [77] was used to generate percent identity plots of the Xist sequences relative to human and mouse XIST/Xist. Unfortunately, to date, there are no programs available that w i l l quantitatively assess the percent conservation of R N A foldings across an entire transcript (i.e., provide global structural conservation information in R N A ) . The poor D N A alignment and the large size of Xist make this especially problematic because many R N A conservation prediction tools require an initial alignment and cannot handle sizes larger than a few kilobases without prohibitive computational times [78]. One question that must be overcome is defining what is considered a "conserved R N A structure." H o w would an algorithm be designed to recognize a conserved R N A domain in the context of an entire complex R N A molecule? Would a hairpin that is present in the R N A of all species be considered "conserved" and biologically significant even though it belongs to drastically different, larger predicted structures in each species? Should conservation be defined at individual hairpins/stems/loops or at complex global structures? Also , what should the assigned relative scores of conservation be based upon? Due to the existence of these problems, 19 the present analysis of Xis t R N A structure has been reduced to a qualitative one. However, methods to draw consensus structures with alignment data are available. I have examined consensus R N A structures for regions of Xis t that are conserved sufficiently in D N A sequence for proper alignment. R N A l i f o l d was used to predict consensus structures from aligned Xist repeats. The algorithm [79] simultaneously considers both thermodynamic stability (minimum free energy) and compensatory mutations to generate an optimal R N A structure, presented in postscript format. The given overall energy of the structure is the energy averaged over all sequences in the alignment. Bonus energies for compensatory and consistent mutations are assigned, while penalties are given for predicted base-pairs of secondary structures that are not common in all sequences. Because R N A l i f o l d can only handle 2 kb at a time with a running time of 0(n ) (proportional to the length in nucleotides cubed), repeats whose lengths exceeded this limit due to high copy number were reduced by whole monomers until they fit the limit [79]. Constructing multiple alignments of Xist was generally infeasible due to high sequence divergence and ambiguities as to which species segments were analogous. However, for the internal exons, orthology was clear. For these segments, I initially used Mfo ld [80] to predict the R N A structure of single sequences based on minimum free energy. The results were qualitatively assessed for the appearance of globally similar structures for each species. The categorization of structures as "similar" was not feasible, so this approach was abandoned. Nevertheless, some of the Mfo ld predicted structures are shown in the results section for comparison of images generated from different programs. The C A R N A C computational method [81, 82] was used in a more systematic attempt to find conserved structures. Unlike R N A l i f o l d , this method requires no input alignment and hence works for diverged but related sequences. It indicates similar structures within each sequence, rather than giving one consensus structure based on all input data. One advantage is that it is able to handle up to 2 kb per input sequence at a time and is computationally simple, with a run time of a few seconds for shorter sequences (<300nt) and a tolerable several minutes for longer input (up to 2 kb). C A R N A C uses energy minimization to first predict the most stable stems in each input sequence. This is followed by detection of analogous stems from pairwise foldings of orthologs [81, 82], which is achieved by phylogenetic comparison and sequence conservation. The program performs a low stringent alignment of the input sequences to look for short areas of 20 sequence similarity to use as anchors for comparison. These anchors serve as contextual information so that folded structures can be compared between orthologs at corresponding locations. Stems that are similar between pairwise comparisons are then categorized into the same "components," consisting of nodes (each node represents a stem found in an ortholog) connected to each another i f they are common between pairs of input sequences (Figure 2.1). Components with fully interconnected nodes equal to the number of input sequences are considered to be the most reliable stems, considered by C A R N A C to be conserved structures [81, 82]. These represent stems that are present in each sequence/ortholog and are shared between any pair of orthologs. One disadvantage of C A R N A C is that the output gives no quantitative measure of how confident one should be of the common stems found, but simply indicates the locations of the common stems in text (connect files, .ct) or visual formats ( JPEG, .jpg; Postscript, .ps). The program finds common structures that are present in only a subset of the orthologs and also indicates when no folding commonalities can be found among the sequences inputted [81]. A t least three different causes can block the appearance of shared structures even i f they do exist. These include overly high sequence similarity (>95%) between the inputs, which does not allow the program to take advantage of compensatory mutations to infer common structural foldings; too highly diverged sequences (>50%), which does not allow the program to use locally conserved regions to compare stems; and the presence of pseudoknots which the program cannot detect [81]. The performance of this program was previously evaluated by testing it on diverged ciliated telomerase sequences (differing lengths of approximately 200-3OOnt, unalignable with traditional methods) from three species known to share common structures despite weak conservation. This confirmed C A R N A C ' s ability to recover the structure of each, complete and consistent with the reference structures in Rfam (an R N A structure database) [83] with low false positive rates. A s well , C A R N A C was used in cases where the R N A s shared only a partial common structure in relatively long sequences, as in the 5 ' U T R of three 18 OOnt m R N A s from enteroviruses. The analysis showed that C A R N A C predicted conserved stems only before the start codons, in accordance with known data [82]. Although the performance of C A R N A C was evaluated only qualitatively, it seemed suitable to use as a tool to analyze Xist at least to get a crude picture of conserved regions found in a number of eutherians, as a starting point for experimental assays to test functionality of these regions in the future. 21 Nucleic Acids Research. 2004. Vol. 32. Web Sen-er issue seql seq2 seql/seq2 seql/wq3 «q3 scq2/soq3 I all potential stems 2. pairwisefoldings 3. stem graph Figure 2.1. The Theory B e h i n d the C A R N A C A l g o r i t h m . Stable stems are first predicted for each sequence independently, followed by co-foldings to find stems common to each pair. This information is used to build a stem graph which categorizes similar stems into the same component. The components wi th the number o f nodes equal to the number o f input sequences (signifying that the stem was found in each ortholog) as wel l as fully connected nodes (symbolizing that each pairwise comparison contained analogous stems) are taken to be the most reliable stems, given as C A R N A C output [81, 82]. 22 2.3) Results 2.3.1) Generation of the Xist Data Set The Xist c D N A sequences from dog, rat, cow, human, mouse, vole and coast mole were used for comparative analysis. Because the X chromosome sequences o f dog and rat are available on N C B I but have not yet been annotated, h u m a n X Z S T c D N A (gi: 340393) was compared with the dog and rat genomes (blastn) to identify Xist orthologs (MapViewer, N C B I ) . B L A S T retrieved several dog and rat supercontigs that corresponded to repetitive regions within the human XIST sequence. Since the boundaries of dog and rat Xist within these contigs were unknown, the approximate 5' and 3' ends of the gene were extrapolated based on size. The predicted Xist ortholog sequences were then compared in dotplots against human XIST c D N A until the observed alignments spanned the entire human XIST sequence (i.e., sequences that showed alignments in excess of the human XIST sequence were truncated, and those that were too short were elongated until the plots reached full coverage). The resulting dog Xist genomic sequence consisted of two contigs that were combined (see Table A . 1 for accession numbers), while the rai Xist sequence corresponded to the reverse complement of a portion of a single contig. Since Xist c D N A sequences were necessary for the subsequent bioinformatics analysis, approximate exon/intron boundaries within the dog and rat genomic sequences were defined using dotplots against human and mouse XIST/Xist c D N A . This was followed by prediction of potential splice junctions using N N S P L I C E version 0.9 (http://www.fruitfly.org/seq tools/splice.html). The most probable of these splice sites were selected based on sequence similarity to known exons in vole, mouse and human [63], as well as consistency with estimated sites from the dotplots. Data regarding the exon and intron boundaries for the predicted dog and rat Xist are shown in Table A . 1 . From this information, the "introns" were removed from the Xist genomic D N A sequences to produce the virtual dog and rat c D N A sequences. Initial coast mole Xist fragments were obtained via inverse P C R with conserved primers by a former graduate student, Sanja Karalic [84]. I sequenced the remainder o f coast mole Xist via a progressive P C R approach, using primers that were mole Xw/-specific and degenerate (Table A.2) . Large segments within exon 1 gave multiple low intensity P C R products due to repetitiveness in the region. Gel purification of the desired P C R products followed by 23 sequencing failed to give results due to remaining contaminants and/or low template concentration for the sequencing reactions. This demanded that I use traditional cloning to increase the yield of the desired products for sequencing. Therefore for these regions, I cloned either original purified P C R products ( T A cloning) or P C R products that were restriction enzyme digested (creating sticky ends for vector ligation) (Figure 2.2). To sequence the internal exons, coast mole c D N A was generated from reverse transcription of R N A extracted from cultured female mole fibroblasts. Conserved primers designed from exons 2 and 4 were used for subsequent P C R amplification of the coast mole c D N A . In total, I sequenced 14.5 kb o f coast mole Xist, whose alignment with other Xist sequences confirmed that the transcript has almost been entirely sequenced, with the exception of 1 kb at the 5' end and 2.5 kb at the 3' end. 2.3.2) Notable Differences in Sequence Characteristics in Different Eutherians Disparities between eutherian Xis t function could result from sequence or structural variability within the orthologs. To examine differences between eutherian sequences, Xist repeats from the seven eutherians were determined using Tandem Repeat Finder. The repeat sequences were verified by dotplot comparison and by visual inspection. Details about the repeats (consensus sequence, monomer size, copy number, percent identity, location) in the different species are listed in Table 2.1 and Figure 2.3. The dog Xist sequence contains the complete set of repeats established in human and mouse. In addition, the dog B repeat is interrupted by an insertion similar to the situation in human [63]. The location of dog B and C repeats are reversed compared to other eutherians, suggestive of an inversion (Table 2.1). N o C repeat (truncated or full length) was detected in cow or mole Xist using B L A S T , low stringency dotplot comparisons, visual inspection, or the Tandem Repeat Finder program (Table 2.1). However, a low number of copies (1-2X) was observed in human, dog and vole, while the C repeat was expanded in mouse and rat (Table 2.1). Cow Xist contains at least two repeat elements detectable in dotplots (Figure 2.4) that were not visible in other eutherian sequences. The four non-rodents (human, mole, cow, dog) possess expanded D repeats compared to those observed in the rodents (Figure 2.5). Although all seven species display a characteristic repetitive region in the centre o f the D repeat known as the D core [20] (Figure 2.5), the composition of the remainder of the D repeat differs between rodents and non-rodents. Specifically, according to dotplots, rodents lack obvious sequence 24 COAST MOLE Xist 380bp 4600 1507bp 19 23 r T / R2 \ 15 R3 24 -27 R17 700bp/ 9147 20 18 * 700bp 11938 1.4kb 25 1kb, 150bp PCR expect 600bp >26 ^ R11 R8 120 1074 2581 s 4kbPCR (expect 2.1 kb) R13 R10 EXON 1 (11364 bp in human)1 5kbPCR (expect 4547bp) , R9 n 11300 e 6 R16 1 2 R12 12104 R5 R16 13743 |—| green numbers refer to locations relative to 4 0 0 b P human XIST sequenced PCR product | j inverse PCR coast mole primer sequenced clone human primer • - - - amplified with PCR conserved primer 22 1 1— * \ EXON 6 (4.5KB in human) 17 21 \ E2 E3 E4 E5 200bp E6 e6 400bp PCR (expect 207bp) F igu re 2.2. Coast mole Xist c D N A Sequence. Primers used for sequencing are shown with degenerate primers shown as purple arrows and coast mole Xist specific primers shown in black. Green numbers indicate approximate locations o f the coast mole sequence relative to human XIST based on B L A S T alignments. P C R product sizes are given in bp or kb. The internal exon region (exons 2-5) is magnified below for clarity. Table 2.1, Xist Repeats in Mul t ip le Eutherians. Repeat location, copy number, total length, monomer length, and percent identity are listed. The accession numbers of the Xist sequences are indicated. A Is' lOOOnt in RNA; hairpinl + hairpin2 + A T - r i r h snacer F ~750nt downstream A, discovered in vole); 16bp motif B C rich 4-9bp motif C D 290bp monomer E AT rich Variable Repeat Human cDNA GI: 340393 M97168 gDNA GI:45269107 U80460 (49937-82015) . \ 1 —1 1L 11 284-781 (8.5X) = 429nt total 43-59nt monomer 78-86% 1443-1544 (2X) = 10Int total 42nt monomer 70% Bh:1975-2068 (12X) = 93nt total 7-9nt monomer 87% B : 2809-2927 (17X) = 118nt total 6-9nt monomer 78% 3045-3090 (1.9X) = 45nt total 83% 4582-8419 (12.6X) = 3837nt total 289nt monomer 76-83% 12213-13702 (25A) -1489nt total 18-25nt monomer 69-84% 25045-26533 i m i n i 1 r\f\A ^ A Y ^ — Mouse cDNA GI:202420 AJ421479 414-708 (8X) = 294nt total 42-74nt monomer 75-83% 1192-1531 ( 2 X ) G C = 339nt total 33nt monomer 82% 2817-3000 (32X) = 183nt total 6-8nt monomer 83% 3225-4673 (14X) = 1448nt total 119nt monomer 89% 6381-6500 ( I X ) = 119nt+ 10X truncated copies = 380nttotal 73-80%o 10230-1 1 0 0 4 ( j O A ; — 774nt total 20nt monomer 59% gDNA GI:21425583 AJ421479 (106332-395-688 2805-2988 3064-4512 15159-15932 I L / SOU; Vole cDNA gDNA GI: 13445263 AJ310127 290-718 (8X) = 428nt total 40-5 5 nt monomer 80-85% 1450-1534 (5X) = 84nt total 15nt monomer 91% 2800-2976 (30X) = 176nt total 6nt monomer 84% 3003-3100 (0.8X) = 97nt total 3336-6451 (9X) = 3115nt total 78-83% 1 4 1 8 9 - 1 5 5 2 0 ( J O A J -1337nt total 16-22nt monomer 64-73% 8643-9978 • 1 A"ifi ^ 1 C\HH A (A ~V\ — C o w cDNA gDNA GI:21425595 AJ421481 445-799 (8x) = 354nt total 42-43nt monomer 88% 963-1579 (12x) = 616nt total 24nt 70% 2795-2970 (32X) = 175nt total 6nt monomer 87% N o repeat seen 4432-16759 (48X) = 12327nt total 95-96nt monomers 58-62% 9294-19774 ( 4 A ) -480nt total 14-18nt monomer 55-60% 143833-144313 Table 2.1. Xist Repeats in Mul t ip le Eutherians (Continued...) M o l e c D N A Not yet sequenced 121-550 (1 lx) = 429nt total 39nt monomer 75% 1688-1850 (10x) = 162nt total 1 Int monomer 65% No repeat seen 3353-10050 (18X) = 6072nt total 97-3 66nt monomer 76-93% 13021-13580(4X) = 559nt total 17nt 76% Dog c D N A g D N A GI: 50088291 AAEX01057775 (17368-24421bp) + AAEXO105 7774 (l-25440bp) pieced together 1 =beginning of pieced sequence 368-789 (8x) = 42Int total 44-45nt monomer 80-85% 1253-1554 (13x) = 30Int total 20nt monomer 73% Bh: 1847-1892 (3x) = 45nt total 19nt monomer 82% B: 1893-3135 (27x) = 1242nt total 6nt monomer 71% 813-2975 (1.9X truncated copies) = 2162nt total 87% 4635-7470 (31x) = 509nt total 94-96nt monomers 59-78% 14232-15411 (6X)= 1189nt total 19-25nt 56-79% 27975-29164 R a t c D N A 638-1049 (8x) = 41 Int total 45nt monomer 80-84% 1524-1849 (5x) = 325nt total 18nt monomer 82% 3094-3232 (24x) = 138nt total 6nt monomer 76% 3296-4810 (15x) = 1514nt total 98nt monomer 75% 6523-8699 (21x) = 2176nt total 103nt monomer 81% 11125-11714 (24X) = 589nt total 24nt monomer 55% g D N A GI: 34881475 N W 048043.1 (2960970-2986883bp reverse complement, where 1=2960970) 4789-5200 5675-6000 7244-7382 7447-8960 10674-12850 19768-20357 Bh C Human Dog Mole Cow Rat Mouse Vole I I I I: Bh B D C O B D D D L) D I  B D 1 I Figure 2.3. Summary of Tandem Repeats within Xist in Multiple Eutherians. Repeats A and B [19], C , D , and E [20], as wel l as F [63] are shown in the different eutherians used in the study, based on dotplots against human and mouse XIST/Xist c D N A , or their own sequences. The two pink repeats in cow seem to be species specific, whereas the two pink repeats in mole seem to be distinct sequences that form a larger D repeat structure. The B repeat and the C repeats are reversed in order in dog. Both dog and human have a split B repeat. 28 a •: - • • Window s a p 9 HismacnLimiti-30WT-SMdi ontMpiotto oct*n coonMnja inftrmjbcn C O W DNA 1 on horizontal aws = 21320 bases DNA 2 on vertical axle • 21320 bases M O L E j c ns • d i a ontha plot to ootsin eocrcmate information D O G DNA 1 on horizontal ans = 19212 bases DNA 2 on vertical axis = 19212 bases • X'-r.i-'jr.i^r-'"." Click on plot to get positional a I'OU clrd-ac near base 1013ft in C N * 1 ana case . " IS m ON* 2 DNA 1 on horizontal ans = 11655 bases DNA 2 on vertical axis = 11S5S bases M O U S E Click on plot to get positional data Dene - elide on tne plot to obtain coordinate information DNA 1 on horizontal aas = 13841 bases DNA 2 cm vertical aws =13841 bases V O L E Click on plot to get positional data Don* - ©iff on tre p t«B ootsin wcramara infcrmacon DNA 1 on horizontal aas = 13660 bases DNA 2 on vertical ans = 13660 bases RAT Click on plot to get positional data Figure 2.4. Eu the r i an Xist c D N A Dotplots . Sequences o f Xist orthologs are aligned to themselves ro reveal repetitive regions. In general, non-rodents have expanded D repeat (yellow); rodents have more apparent C (turquoise) and E repeats (purple). 29 >ou aitteo near esse 1U03 ir> OH* 1 ana oass 13390 in DNA 2 DNA 1 on horizontal axis =14210 bases DNA 2 on vertical aws • 15525 bases DOG rff—n HUMAN «Plol Window Slap?"" COW • 4ft» w • n r X HUMAN MgjwPjtiJ Window Sua g i s lone - click on the plot tc obtain coordinate information DNA 1 on horizontal axis = 14210 bases DNA 2 an vertical axis = 13841 bases VOLE X-I I ! !. V HUMAN Mane Plot WindowSize j i ? Oona • dicK cn the plot to obtain coordinate information DNA1 on horizontal axis -1 *21 Obas9s DNA 2 on vertical axis = 13683 bases RAT : , • . . • ! • . \-.V*C5v.<:-'. • '•'.'vj'V:!'!1'^ "" I*' ' ' « P n HUMAN Mismatch l imit | -f'ou clicked near case U103 in DN*-1 and case 21229 in D N - 2 DNA 1 on horizontal axis =14210 bases DNA 2 on vertical ans = 21320 bases Vou clicked near base UQ5? in DH- i and base 1-5^3 in QNA2 DNA 1 on horizontal axis = 14210 bases DNA 2 on vertical axis = 14580 bases MOLE HUMAN MOUSE , , . . - . v -v HUMAN cu dichad near a w e 11933 m Dn-. 1 and base 10824 in on*z DNA 1 on horizontal axis = 14210 bases DNA 2 on vertical axis = 11655 bases Figu re 2.5. Eu the r i an Xist c D N A Dotplots against H u m a n Xist c D N A . Repeats are indicated. The D repeat (yellow) in cow and dog is expanded compared to others, whereas the repeat (turquoise) is amplified in mouse compared to others. A r r o w points at the D core [20]. 30 similarity before the D core region to human Xist c D N A (discussed below in Section 2.3.3 as segment 4) (Figure 2.5). The E repeat is not apparent in mole and dog Xist dotplots, whereas in rodents this region clearly displays expansion of a monomer (Figure 2.5). The primary sequence of the Xist E repeat differs significantly between eutherians, as the tandem repeat finder identifies species-specific consensus sequences (data not shown on Table 2.1). However, al l E repeat sequences are low in G C content and situated at the beginning o f the large 5' Xist exon. The predicted cow, mole, and dog c D N A sequences have similar exonic structure to human Xist, with a notable exception of human exon 2 (Figure 2.6). The pairwise identity of cow-human exon 2 is 14%, mole-human is 12%, and dog-human is 15% (Table A.1) . The Xist exon 2 of cow, mole, and dog resembles that of mouse and vole, with pairwise identities ranging from 63%-70% (Table A.1) . A s expected, the rat sequence is most similar to the other rodents, due to the presence of the rodent-specific exon 5 (Figure 2.6). The exon 4 sequence is the most conserved, showing pairwise identities ranging from 52-96%, compared to sequences of the other exons (pairwise identities of exon 2 is 10-85%, exon 3 is 12-81%, and exon 5 [rodent exon 6] is 8-83%) (Table A.2) . Dotplot comparisons of coast mole Xist c D N A with cow or human Xist genomic D N A revealed an exon in mole bearing sequence similarity to human and cow intronic sequence (Figure A.1) . Inspection of the region confirmed that it is not similar to the unique rodent exon 5 or non-rodent exon 5 sequences. Hence, mole Xist contains an additional exonic region, either signifying an elongated exon 4, or a separate exon between traditional exons 4 and 5. Mult iPipMaker of Xist c D N A in the seven eutherians revealed that exon 4 was the most highly conserved in primary sequence (Figure A.2) . Plots generated from genomic Xist sequences did not reveal any intronic regions with consistently high percent identities between pairs of species (Figure A.3) . Dog and human as well as rat and mouse were the most similar in intronic sequence to one another. 2.3.3) Choice of Orthologous Segments The structures of Xist repeats A to F (Figure 2.3), as well as exons 2 to 6 (Figure 2.6) were individually predicted using C A R N A C . For other parts of Xist, repeats'were used to anchor the orthologs and regions between the repeats were considered related. Because differences between repeats and sequence lengths led to difficulty in distinguishing analogous 31 JI COMPARISON OF Xist GEHEIHTROH EXON STRUCTURE | 354096 MOUSE BOVINE 116801 HUM AH 3 3S3614 366632 366763 368341 368495 376603 376940 366392 366482 367S12 367722 367867 368013 369277 370934 373798 A G G T A G G T A G G T G T A G tt A G PolyA signal £GT A G G T A G G T A G GT • 1 1 1 134811 1 J S 2 1 S 1 J 6 J , 5 „ „ „ 1 J 9 5 J 4 • T | GT A G G T A G G T A G G T 137781 137917 I • |G T A GgG T * A G j G T A G M G T • » * * * * 37696 37832 39796 40004 29688 33S67 33630 -D-AG |j GT A T -AGri GT AGBGT AGn GT AGn GT AGg GT AGj-j GT A G A G 143828 G T A G J | G T AG, 142*51 142783 G T AGIIGT A G 41840 42003 43087 AGn GT Cr A G AG n GT AGHGT A G AG n GT AG n GT A G 0 G T _ [ t§l • JW EXOH (EXPERIMENTALLY CONFIRMED) [j CONSERVED REGION CORRESPONDING TO A KNOWN EXON IN A T L E A S T ONE SPECIES 47626 48855 49000 M „ 50396 G T GT G T Figure 2.6. Summary Diagram of Exon and Intron Structures of Xist Orthologs. M o l e , dog, rat, and vole Xist sequences are shown in comparison to human, mouse and cow orthologs. Figure is extended from [60]. Potential splice sites are shown for dog and rat, whereas the splice sites shown for other species have been experimentally confirmed. Regions showing sequence similarity to known exons are shaded in grey. The mole sequence has not been sequenced to completion at the 5' and 3' ends. Xist segments across species, a second approach was used to ensure that the corresponding regions from different species were analyzed together. This approach consisted of comparing each o f the species' Xist sequences to human XIST c D N A in dotplots (Figure 2.5), in order to find similar regions. These two input methods w i l l hereafter be referred to as "Method 1" (repeat anchors) (Figure 2.7) and "Method 2" (dotplot) (Figure 2.8). The segments that were analyzed in Method 2 (based on alignment patterns to human XIST cDNA) are indicated on the dotplots in Figure 2.8. These segments are designated 1-10. Segment 1 represents the sequence before the A repeat and is therefore identical to the "before A " region used in Method 1. Segment 2 is between the A and F repeats, segment 3 is between the F and B repeat, and segment 4 is between the B and D repeat. Because in the dotplots, rodents lacked sequence similarity to human XIST in segment 4,1 analyzed rodent and non-rodent sequences separately for this region. The repeat D was divided into four segments, segments 5-8. The first of the four - segment 5 - is characterized by expansion of a small monomer present in the human XIST sequence. Rodent dotplots did not show this pattern; hence rodent sequences were excluded from segment 5 C A R N A C analysis. Segment 6 marked a less repetitive D repeat region that displayed a clear alignment with human XIST, whereas this characteristic was not visible in rodent Xist dotplots. Due to this reason, rodent sequences were excluded from the segment 6 input. The D repeat core was present in all species dotplots and was designated segment 7. Segment 8 represented the remainder of the D repeat, whereas segment 9 contains the sequence after the D repeat continuing to the end of exon 1. The internal exons, exons 2-5 (or 6 in rodents), were analyzed individually, followed by segment 10 which was equivalent to the region of the 5' large Xist exon containing the E repeat. 2.3.4) Findings from Method 1 Analysis (Repeats to Anchor Orthologs) The sequence before the A repeat, ranging from 242 - 325 bp of sequence, displayed pairwise identities o f 35%-80% (Table A.4) based on ClustalW alignment (Figure 2.9). For this region, C A R N A C revealed common stems in all species sequences analyzed, ranging from 3-9 bp in length (Figure 2.9). Likewise, the A repeat and F repeat (355-435 bp and 46 - 372 bp regions, respectively) yielded a few common stems (Figures 2.10 and 2.11). The A repeat stems were a minimum of 5 bp to a maximum of 10 bp, with two consecutive A repeats contributing to a single hairpin (as 33 Figure 2.7. M e t h o d 1 Ana lys i s D i a g r a m . Segments that were analyzed together are shown in the same color. Fragments between repeats that were analyzed are shown in red. Orthologous sequences were analyzed up to 2kb at a time, except the D repeat which was divided into the first and last kb o f the repeat, shown in yel low. ion* • CHC* on the piotto omain coordinate intcrmation VOLE H DNA 1 on horizontal axis - 1 4210 bases DNA 2 on vertical axis = 13641 bases w i t 1 V Mismatcn umit |-vou dictsa near case 11*03 in ON* I and base 13330 m DNA 2 DNA 1 on nonzontal axis =14210 bases DNA 2 on vertical axis = 15525 bases DOG a HUMAN 10 HUMAN Internal exons B Segments 2 3 8 Human + + + + + + + + + + Cow + + + + + + + + + + Dog + + + + + + + + + + Mole - + + + + + + + + + Mouse + + + - - - + - + + Rat + + + - - - + - + + Vole + + + - - - + - + + Figure 2.8. M e t h o d 2 B r e a k d o w n of Xist Or thologs . The Xist sequence analysed in Method 2 are designated segments 1-10. Segments 5-8 comprise the D repeat, segment 2 is between the A and F repeat, segment 3 is between the F and B repeat, and segment 4 is between the B and D repeat. D o g and vole dotplots are shown above to contrast rodents and non-rodents in their alignments with human XIST, especially for the D repeat. Segments are marked on the dotplots for clarity. + or - signify obvious or absent dotted alignments with human XIST, respectively. 35 90 100 1 1 0 1 2 0 1 3 0 1 4 0 1 3 0 ICO 1 7 0 1 8 0 1 9 0 2 0 0 2 1 0 2 2 0 Cow/1-464 - T C TCAC T T C T I A A A G C GC TGC A C TTTGC T G C GAC C G C C A T A T T T C T T C T T T T C CC G A G A - T G G A A G C T T A T T A A T A T T G G A T T T C T T T G C CTGTGTG6TTC T T T C T G G A A C A T T T T C C A G A C C C C A A C C A T G -Doa/1-464 CCC TCAC TTC T T A A A G C A C T G C A A TTTGC T G C T G C C G C C A T A T T T C T T C C T T T C C C G A G A - X G G T A G C T C G C T A A C A GTGGGTGTCTTC G C C C G T G T G G T T C T T T C T G G A A C G - T T T T C C A G ~ CICCAACCACC- • Hum/1~464 C CTTCAGTTC^ftAAGC GC T G C A A T T C GC T G C T G C A G C C A T A T T T C TTAC T C T C T C G G G G C T G G A A G C T T C C T G A C T G A A G A T C T C T C T G C A C TTG<»GGTTCTSTC T A G A A C A T T T T C T A G T C C C C CAACAC C CTTTATGGC G Rat/1-464 C GC C A G T C A G T X A A A G G C G A G C A A C T G C T T G C T G C A G C C A T A T T T G C T C G T C T C C C 6TGGATGTGAGGTCTC C T C C G T G G T T T C T C T C C A T C TAAA G G G C T - TTTGGGGAACATTTTrAATC C CCCTAC C A C C A T G C C TOATGGTG Mouse/1-4 64 T G T T T G C T C G T T O C CC G T G G A T G T G C G G T T C T T - C C G T G G T T T C T C T C C A T C T A A G G A G C T - T T G G G G G A A C A T T T T T A G T T C C C CTACCAC C AA G C C T X A T G G C T v o l e / 1 - 4 6 4 A T G X T X G C C A G T S E C C C C G T G G A T G T G A G C C T C C T - C T G T G G T C T C T G T C C A T C T A C C G G G G T - C T C T G G G A A C A T X T X I A G T T C C C T C A C C A C C A T G C C T T A T G G T G Q u a l i t y / 1 - 4 JIL J U U L 230 240 250 260 270 280 290 300 Cow/1-464 CCTTATGGC ATATTTrnTGGAAAAAATTAC A C C A A A A A T T C ATAAAATAT/TTTTA - A A A A C CTCAiriTTClTCXJLGTA -Doc;/1-464 C C T I A T G C C CTATrTCTTTAAjaULAATT- C A C C C A A A T T C CATJLAAATATCTTAA - C AATTCT/GAA C T T T C T T C G A G T G -HUM/1-464 CTTTATG G C G T A T T T C T T T A A A J L A A A T — C A C C T A A A T T C C A T A A A A T A T T T T T T T A A A T 7 C T A T A rjTTrCTCrEUjTG-Rat/1-464 ATKCTTATGGTCTATTT AAGAAAA C A - T A T C A A M T 7 A C A T A A GATTTTTG AT—GTTTTGATATGTTCTCCTAAGC-House/l-464 AAGCCTrATGGCTTATrT AAGAAAACA -TATCAAAATTCCACGA GATTTTTGA C - - GTTTTrATATGTTCTGGTAAGAT vole/1-464 ATC^CTTATGGTGTATTT AAtyuUATA-TATXMGATTCrATAATlTTTTTrMA—GATTTTATATCTTCTGCTAGG— 310 320 330 340 350 360 370 380 - TTTTCTTGACAC CTTCTC AGTATTTT ACAJ^TACTTCWMMATTTTTAGG^ - u n c i i IJACAC CTCCTCC CTATXTT- --TCTTCTTGACACt?rXC7X C A T A T T T ^ -TTTTCTTGACAC GTC CTC CATATTTTTTG j " • " ITl'lTl^'lljACATCTCCTCrATALITfl'l GAT ATT \ GTAATATTTTC A CTCAATIT 3 ICAl'iTTi A A G G A A T ; C T T T T C T T G A C A C CTTrTTCATA - -TUT G A T A T T T G T A A T A I l l l l W*ACAA 1 II 11 UmTTTAATGACTJ B DOG cow y RAT MOUSE VOLE [ mraxel-326 ] [ h u m l - 2 8 4 ] HUMAN QO 10 U 2U 40 100 ICO A40 ICO ABO =00 [ d o g i - 2 4 2 ] [ z . i t 4 3 4 1 - 4 6 6 5 ] IDO 121 40 £ 0 60 100 ll'O 140 160 100 -00 [ V o X e x i u t q d l l U ] [ c t fw l -319 ] • » r • • 1 Figure 2.9. Xist Sequence Before A repeat. A ) Clusta lW alignment o f orthologs in this region. B ) C A R N A C structures depicted in each species. Consensus structure predicted by R N A l i f o l d is boxed in green. C) Common stems found with C A R N A C in six eutherians. M o l e was excluded due to lack o f sequence in this region. 36 Figu re 2.10. C A R N A C and R N A l i f o l d Structures for the Xist A Repeat A ) Stems that are analogous in each species, as determined by C A R N A C . B ) Consensus structure predicted by R N A l i f o l d . C ) The stems corresponding to the top C A R N A C figures. 37 H U M A N R A T V O L E 50 SO D O G M O U S E b Representative CARNAC structure 5 0 SO c o w M O L E C O W 5 0 50 F igu re 2.11. Xist Repeat F . A ) C A R N A C stems conserved in the seven eutherians. B ) C o w conserved secondary structure generated from C A R N A C , representative the stems in the other eutherians. C ) R N A l i f o l d consensus structure o f Xist repeat F . 38 opposed to two hairpins forming from a single A repeat - the hairpin l-hairpin2 structure [Section 1.6]). C A R N A C only predicted one conserved stem within Xist repeat F which was 4-10 bp, depending on the species. The multiple alignments of these two repeats (Figure A.4) were entered into R N A l i f o l d to predict consensus structures (Figures 2.10 and 2.11). A multiple alignment of the Xist 5' end spanning these two repeats was also analyzed by R N A l i f o l d to visualize the common underlying structure in this region (Figure A.5) . R N A l i f o l d predictions were window independent, yielding the same consensus structures for a given portion of an alignment regardless of context (Figure A.5) . Hairpins displayed in the R N A l i f o l d consensus structure were compared to common stems predicted by C A R N A C . It was noted that the C A R N A C stems always fell into the same locations as stems from RNAl i fo ld , but R N A l i f o l d generally also output stems not present in C A R N A C predicted common structures. This could reflect the different questions each program addresses: R N A l i f o l d attempts to find the common underlying structure from an alignment, whereas C A R N A C predicts stable stems in single sequences and identifies stems common to cofolded sequences. Since R N A l i f o l d outputs consensus structures even when C A R N A C is unable to detect common stems, Xist sequences were first analyzed with C A R N A C . R N A l i f o l d was utilized subsequently, i f alignment between the analogous regions was possible. For the rest of Xist, common structures were predicted only when rodent Xist sequences were analyzed separately from non-rodent sequences. C A R N A C detected conserved stems in the region between the A and F repeats, as well as for the D repeat (Figures 2.12 to 2.14). Between the A and F repeats, the predicted common structures were complex, showing a large number o f stems, when rodents or non-rodents sequences were considered separately. This is presumably because some of the structural similarities reflect close evolutionary relationships rather than conservation due to biological significance. The R N A l i f o l d consensus structures resulting from multiple alignment of the region between A and F repeats were different between non-rodent versus rodent groups (Figure 2.14). The last kilobase of the D repeat was conserved in structure between all three rodents, whereas the first kilobase ofthe same repeat shared secondary structures only between mouse and rat. However, according to C A R N A C , non-rodents do not share secondary structures in these regions. This might imply that rodents have unique functional domains of Xist that are not found in the other eutherians. 39 A B RAT ""X. M O U S E F igure 2.12. The Region Between the A and F Repeat for Rodents. A ) C A R N A C stems identified when non-rodents were excluded from the input. B) C A R N A C structure from the corresponding input. 40 A 10 20 JO 40 SO SO 70 80 90 100 110 120 130 140 UO liO 170 180 ISO 1-7(7 THHmwiiiiCTmHtmnracmawBCTKiflnauu™ Wi-717 - -imat^OTMAcraaKifflaaKACTTtmrcOTKrwcm^ j.^i-717 n n ^ c g r < ^ g a t « T O K A i a n a - - W O T A g c m w C T a t t w ^ ^ wc c.ut/i-717 4OTmiKAttracira;«cTiauc-Ksst«n«iAK«OTCT^cAtt^^ rat/1-766 TTTCTGCGTGACACAGATATnrTC iouM/18-766 (TTTraGmTlCGGCTATTnr. vole/20-766 TTGCT5CGTGACAT G i w u c m t G C - u c u m c i ^ a c i a r a H H ^ a a ^ ^ awTwrnrGGCAaomKMaA GAGCCAffnACGCCAAGUTTAGGAOCCGAtlGAGCAClK lUGCttlSATGimGUTrAHKKOT^^ GACTTGATTf&TGCC-—TUTHASIC . m n i i t r / l - 7 6 i j u 4 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 iK/i-766 -(auwAccTKUTATATmracrrm^ •0U3c/ie-766TGCAnACCTOAATAT--nATOCT^ voie/20-766 Tin»CTAam*ATAG--ccATACCTrrrnnm Qutdltv/1-761 Figure 2.13. Rodent versus Non-Rodent, Region Between A and F Repeats. A ) Clus ta lW alignment o f non-rodent sequences in this region B) Clus ta lW alignment o f rodent orthologs. C ) R N A l i f o l d consensus structure generated from alignment in (A) . D) R N A l i f o l d consensus structure from alignment in (B). 41 Figure 2.14. D repeat in Rodents. A ) Conserved structures found in mouse and rat for the first kilobase o f the D repeat. N o commonality was found in vole. B) Conserved structures o f the last kilobase o f the D repeat, as depicted in al l three rodents. 42 Method 1 allowed me to analyze the internal exons, exon 2 to exon 5 (or exon 6 in rodents), within one 2 kb window. The seven eutherian Xist orthologs did not exhibit common structures in this region. Likewise, the 3' end of Xist failed to uncover conserved stems. A summary diagram of Method 1 results is shown in Figure 2.15. 2.3.5) Findings from Method 2 Analysis (Dotplot to Detect Orthologv) A s expected, C A R N A C found common structures in segment 1, yielding the same structures as above for "before A " of method 1 (Figure 2.16). N o conserved stems were seen for segment 2 (sequence between repeats A and F) or segment 3 (sequence between F and B) . The dotplot method allowed the detection o f regions between rodent and non-rodent orthologs that displayed different alignment patterns to human XIST (Figure 2.8). Analyzing the eutherian Xist segments in two separate categories improved C A R N A C ' s ability to predict common stems within segment 4 (sequence after the B repeat, before the D repeat). Secondary structures were conserved within non-rodents, although the stems were sparsely distributed with stem lengths o f 4-7 bp (Figure 2.17). In contrast, C A R N A C predicted a complex secondary structure of segment 4 that was conserved in the rodents (Figure 2.18). The consensus structures generated from R N A l i f o l d showed distinct folding of R N A in both eutherian categories, confirming the differences in this region (data not shown). According to results from C A R N A C , non-rodents share conserved secondary structures for segments 6 and 8 (parts o f the D repeat), when rodent sequences were excluded from the input (Figures 2.19 and 2.20). In contrast, Method 1 detected no secondary structures from the beginning and end of D repeat that were shared between human, cow, dog, and mole, despite analyzing non-rodent sequences independently from rodent sequences. Since the conserved structure predicted by C A R N A C for segment 8 was a single stem of 6-7 bp within a >800 bp region, the significance of this stem is questionable. Additionally, since a short stem (3-10 bp within a >400 bp region) within segment 7 (D core) was conserved in only a subset o f the seven species (Figure 2.20), this region likely does not carry a functional role. N o conserved structures were found for segment 9 before the internal exons, or segment 10, consisting of the E repeat in the large 5' exon. A summary diagram for Method 2 analysis is shown in Figure 2.21. A s for exons 2-5/6(rodents), only the exon 4 structure (Figure 2.22) was predicted to be conserved in all species tested, whereas the mouse, vole and rat also shared structural similarity 43 Figure 2.15. S u m m a r y of M e t h o d 1 C A R N A C Results. Structures found to be conserved in rodents only are shown on the top; those conserved in non-rodents only are shown on the bottom; and those conserved in both non-rodents and rodents are boxed in red. Structures conserved in al l rodents are boxed in yel low. H U M A N • M 2H 40 cO 80 S M 120 140 1«0 180 R A T o :u 40 t'o co IOO i:u 140 i£ 0 i s o 200 220 c o w O^ 40 £0 80 100 120 140 ICQ 180 100 M O U S E 0 CO 40 6 0 8 0 10 0 12 0 140 Ifi n. D O G 0 20 40 cQ 30 100 120 140 U V O L E " i i i 0 20 40 60 00 100 120 140 I f 0/ I I J y Figu re 2.16. M e t h o d 2, Segment 1 C A R N A C Results . The stems wi th their locations are shown on the left. The corresponding structures o f the six eutherians (excluding mole) are shown on the right. 45 M O L E H U M A N < " i r i i i i i 1 1 1 1 1 —i 1 1 < 1— 1 0 100 200 J00 400 500 cOO "700 SOO £ 0 0 1000 1100 1200 1300 1400 C O W _ Q C L D O G o o Figu re 2.17. M e t h o d 2, Segment 4 o f X«r C A R N A C Results for Non-Rodents . Common stems and their corresponding visual structures are shown on the left and right, respectively, when rodents were excluded from the input. 46 A M O U S E n n r 11 f i n r irfi • ' ^ 1 f f ^ Y in n i H T l U n a I I I I 1 1 I I 1 1 1 1 1 L. 1.  100 ;00 300 400 SOO tOO 700 SOO '-OO 1000 1100 ± 2 0 0 1300 R A T i i i i 1 1 1 1 1 1 1 1 1 >-o ioo ;oo soo 400 soo eoo TOO BOO SOO 1000 xioo 1:00 1300 1400 V O L E r ~ i n r V a n n ^ i . f ^ T ^ n i n t e n t f l g m 1 I I I I I I L 1 1 —I 1 1 1 I 1 ' 0 100 200 300 400 SOO £01) 700 800 SOO 1000 11.00 1200 1300 1400 1500 1600 M O U S E R A T V O L E F igu re 2.18. M e t h o d 2, Segment 4 of Xist C A R N A C Results for Rodents. Common stems and their corresponding visual structures are shown on the top ( A ) and bottom (B), respectively, when non-rodents were excluded from the input. 47 A M O L E B Figu re 2.19. M e t h o d 2, Segment 6 o f Xist C A R N A C results. Common stems and their corresponding visual structures are shown on the left ( A ) and right (B) , respectively, when rodents were excluded from the input. 48 A MOUSE COW U 100 200 300 400 LJ : I 1 0 100 VOLE -° HUMAN DOG XL B DOG HUMAN 100 200 300 400 SOU COO 700 800 n 100 200 300 400 500 COO 700 300 900 MOLE COW Figure 2.20. Method 2, Segment 7 and 8 of Xist CARNAC results. Common stems and their Q _ f~) corresponding visual structures are shown for segment 7 ( A ) and 1 1 1 1 1 1 ' 1 1 i_ segment 8 (B). Rodents were 100 :oo 300 400 500 600 700 000 S 0 100 excluded from the input for segment 8 but not for segment 7. Segment 7 shows partial conservation in the subset o f eutherians shown (mole and rat did not show conserved structures). 49 / 1 B Internal exon region B Segments Human Cow Dog Mole Mouse Rat Vole 1 2 3 4 6 7 8 10 Figure 2.21. S u m m a r y of M e t h o d 2 C A R N A C Results . Structures boxed in red are conserved between the seven eutherians, while structures in yel low are found in a l l rodents or all non-rodents (analyzed separately). Segment 7 is found in only in a subset ofthe seven mammals (mouse, vole, dog, human, c o w ) . Segment 2 was not evaluated by C A R N A C because it corresponds to essentially the same input data as the A repeat from method 1 (see text). M O L E V O L E B M O L E i i ; i i i i i i : i i i i U 10 CO JO AO 50 CD 70 60 90 100 110 ICO 130 V O L E 0 10 20 30 40 SO io 70 -i) M 100 110 ICO I M H U M A N 0 10 ;0 30 40 50 d"Q 70 30 50 100 110 ICO 130 D O G i o :o io 40 so (o 70 n '.ii 100 n o ico n o R A T M O U S E H U M A N M O U S E i i i i i i i i i ' ' ' • * • > 6 10 CO 30 40 SO tU 70 60 90 100 H U ICO l i C 140 150 C O W 0 10 CO 1\) 40 SO CO 70 30 "0 100 110 ICO 120 140 150 R A T I 1 1 1 1 1 1 1 1, , ' ' • ' g io CO ;o 40 50 CO 70 CO «0 100 110 120 170 4 , , -51 C O W v ] / f D O G F igu re 2.22. Xist E x o n 4 C A R N A C Results. Common stems (B) and their corresponding visual structures ( A ) and (C) o f the seven eutherians. for exon 5 (unique to rodents) (Figure 2.23). Mfo ld predicted stable hairpin structures for al l eutherian exon 4 sequences, including the extended exon 4 in coast mole (Figures A.1 and 2.24). In fact, the minimum free energy of the extended mole exon was -139 kcal/mol, compared to -78 kcal/mol for the unextended version (Figure A . 1), and -64 to -67 kcal/mol for exon 4 hairpins in the other eutherians (data not shown). Exon 2 structures output by C A R N A C were only common between rat and mouse (Figure A . 6) and secondary structures predicted for exon 6 were only conserved between vole and mouse (Figure A.7) . O f the non-rodent eutherians, only human and mole showed conservation for exon 5 (Figure A . 8). Thus, it is unlikely that any o f the exons, with the exception of exon 4 and rodent exon 5, play a role in X inactivation. Rodents and non-rodents showed distinct conserved structures when internal exons were treated as a whole unit (Figure A.9). A summary diagram for the C A R N A C exon analysis is shown in Figure 2.25. 2.3.6) Control for CARNAC Output Out of concern that the stems predicted by C A R N A C were in fact artifacts rather than real conserved structures, a randomization procedure was performed by Sohrab Shah on each region o f Xist, as a control for chance stems. The randomization used a C L O T E Computational R N A shuffler (http://clavius.bc.edu/~clotelab/RNAdinucleotideShuffle/dinucleotideShuffle.html) that shuffled single sequences independently, while preserving the lengths of the input sequences, which varied between the species [85]. Each species sequence was randomized and sequences corresponding to the same region of Xist were entered together into C A R N A C . Only Xist regions that were originally predicted by C A R N A C to contain conserved secondary structures in all analyzed sequences were randomized. Input sequences were shuffled 1000 times per region of Xist followed by C A R N A C prediction. Only those resulting stems that were found to be conserved in all of the species from the shuffled data sets were included (i.e., trials where common structures were only predicted in a subset of species were discarded), in order to be consistent with the results given from the original data. In addition, using the same input sequences to perform the control as those used to generate the original Xist data ensured that the nucleotide content and sequence lengths were constant. Individual stems resulting from the same species in a particular Xist region were aggregated in each data set to calculate total stem 52 A V O L E . J u , - j i i i i i_ 50 eO 70 80 50 100 M O U S E —! 1 1 L_ 40 SO 60 70 80 90 100 110 120 130 140 R A T _ l L . _ J L_ 10 20 30 40 50 II 90 100 110 120 130 B B B«i £>->_„ A =«:-0' 8 V O L E O f M O U S E •s. R A T F igu re 2.23. X « r Rodent Unique E x o n 5 C A R N A C Results. Common stems and their corresponding visual structures are shown in (A) and (B), respectively. C ) R N A l i f o l d consensus structure. 53 10 20 30 40 50 60 70 80 90 100 110 120 130 MX4/1-210 -GATnrnraMiGjUTAOKTimGT^ 3»all/l-216AGA3CTTCCTXAGAAGAATAGGCrr(nTTJin^ vole/1-214 - A T n T T C C C C A ^ r a T O T A C T r i f l T O n T K »ex4/l-212 -ATCTCCCCCCAGUTnntKaKTTKnraTTGCAfrarrCKCACn^^ •ACT--ACCAA TA •—GGGAGACACTTCSCTGA CTC^COTTTCCTGA rat/1-132 AlTOOTCCTrtOTarrrrCCAmCTCGCGACCTAm C - -biJ/1-318 --ATintCCTCAAAAGAATAljGCLlijllbililACAGTW^AGTGACCTGT]^ B VOLE HUMAN MOUSE f l MOLE COW RAT DOG Figure 2.24. Consensus Structure and Multiple Alignment of Exon 4. A ) Clus ta lW alignment o f a portion o f exon 4. B) R N A l i f o l d consensus structure as predicted from the alignment input. C ) M f o l d structures o f exon 4 in the eutherians. 54 366632 366763 366392 366462 367512 367722 367667 366613 AG GT AG GT AG GT AG GT AG GT AG 136216 136305 AG II GT AG 139377 GT AG 139584 GT 137781 GT AG 137917 |f GT AG II GT 37696 37832 39796 40664 RODENTS: OTHERS: ALL: AGnGT A G r i G T AGIB1 GT 388341 368495 „„.„, ^ 369277 370934 373798 GT GT A G „ A < J polyA signal AG AG 143828 GT AGlGT AGL 142651 142783 GT AGJIGT A G . 41840 42003 43087 AG fl GT AGFlGT AG n GT AG ""4 GT GT AG r GT AG polyA signal «r POljfA signal «•» A G ^ 44956 4 7 6 2 6 *>»« 4 9 0 0 8 50042 5 0 3 9 6 _GX. GT F i g u r e 2.25. Summary of C A R N A C Results in Exon ic Port ions. Structures conserved between al l eutheians are boxed in red, while structures found in all rodents or al l non-rodents are in yel low. Structures found only in a subset o f these groups are shown. length per orthologous region in a given trial, so that stem lengths within the same number o f nucleotides were compared between randomized and unrandomized samples. Since any secondary structures resulting from the shuffled input sequences were location independent, and not based on real Xist data, they represent stems that are common due to chance rather than to biological significance. The average total stem lengths per orthologous region generated from the original Xist data set was compared to the average total stem lengths per orthologous region in the randomized samples using a T-test to get an idea of how similar the original output was to "chance." This control procedure assumes that longer stems than those conserved by chance are l ikely to be biological significant. Summary values o f the standard deviation, range, and mean stem lengths are listed in Table 2.2. After shuffling 1000 times for each region of Xist, the resulting stem lengths produced distributions with very broad variances, giving large standard deviations (5.09-36.182). After 1000 sets o f input data into C A R N A C , total stem lengths o f virtually all sizes (e.g., 3-187 bp) were predicted from chance. Nevertheless, there were notable Xist regions whose unshuffled average stem lengths were significantly different (defined as p<0.05) from those o f the corresponding shuffled data sets, highlighted in grey in Table 2.2. These regions included the F repeat, segment 7, when all species were analyzed; the sequence after the A repeats, when only rodents were analyzed; and segment 8, when only non-rodents were analyzed. However, all three o f these Xist segments displayed significantly shorter mean total stem lengths when compared to randomized sequences. In fact, exon 4 was the only region found to show a higher mean total stem length compared to that from chance, although this difference was not found to be significant, with p = 0.109. The A repeat and rodent specific exon 5 showed the next lowestp values of 0.177 and 0.201, respectively. 2.4) Discussion Consistent with past comparative sequence studies, rodents have undergone rapid sequence divergence from human compared to the other eutherians studied [3]. In al l cases o f pairwise sequence comparisons of Xist, rodents showed comparatively low sequence similarity to the non-rodents, but higher similarity to one another. The high intron similarity between the rodents is expected because of their relationship on the phylogenetic tree, but i f the radiation o f 56 Table 2.2. Summary of Randomized Vs. Original Xist CARNAC Results. Values for stem lengths after randomization of Xist sequences are shown (left) in comparison to stems from the corresponding original, unshuffled Xist data (right). Significance values are from T-tests (two-tailed) between randomized and unrandomized samples assuming unequal variances. Significant difference is defined at p < 0.05. Regions that showed stems that were significantly different than stems from shuffled data are shown in grey and theirp values are marked with an asterisk*. Highlighted in yellow is a region that shows a higher mean stem length from the original data compared to shuffled data, although the p value was not significant. Xist Region Conservation Randomized Unrandomized T-test Min Mean Max SD Min Mean Max SD p value Before A All eutherians 2 9.44 37 5.27 3 8.33 16 4.80 0.597 A Repeat AfterA All eutherians 3 9.14 33 5.09 5 7.67 10 2.25 0.177 All rodents 4 73.5 150 27.4 37 42.0 49 6.24 0.009* F repeat D last kb All eutherians 3 11.6 40 6.70 4 6.50 10 2.17 0.002* All rodents 4 99.1 187 36.2 63 90.0 109 24.0 0.580 Exon 4! All eutherians 2 8.96 41 5.45 5 13.9 27 5.40 0.109 Exon 5 All rodents 4 20.4 37 4.72 5 12.7 19 7.10 0.201 Unique Combined All rodents 4 80.6 164 29.1 67 72.3 81 7.57 0.195 Internal Exon Region Segment 4 Segment 4 All non-rodents 3 15.8 42 7.30 5 5.23 7 2.23 0.238 All rodents 4 15.5 62 6.43 5 5.93 12 2.54 0.275 Segment 6 All non-rodents 3 15.8 63 10.9 14 25.2 44 1.71 0.263 All eutherians 3 9.82 26 4.78 3 5.80 10 3.03 0.038* Segment 7j (D-core) except mole & rat Segment 8 All non-rodents 3 13.7 45 7.49 6 6.75 7 0.50 0.000* 57 Xist were uniform, mouse-human sequence similarity would not be expected to be significantly lower than cow-human similarity (for example). Structures that were conserved in one eutherian category but not the other provide insight into functional differences of Xis t across species. These include segments 6 and 8 (D repeat) structures, unique to non-rodents; as well as the last kilobase of the D repeat and exon 5, unique to rodents (Figure 2.26). In addition, those regions that showed R N A structure conservation in both rodent and non-rodent groups (whether the structures were different between the groups) reflect functional constraints of the region that presumably still allowed divergence due to co-evolving partner molecules. Xist segments belonging to this category include the sequence between A and F, segment 4, and the internal exon region. O f greatest significance are those structures predicted by C A R N A C to be conserved amongst all seven eutherians, given the divergence of the species analyzed. In particular, the region before the A repeat, the A repeat, the F repeat, and exon 4 contain secondary structures shared by all Xist sequences entered into C A R N A C . However, the randomization control suggests that only a subset of these C A R N A C structures may be meaningful biologically. In particular, exon 4 exceeded the average stem length generated by chance, although not significantly. The A repeat in all eutherians and rodent-conserved internal exon 5 structures also gave relatively low although not significant p values when compared to rest of Xist. O n the other hand, some regions of Xist form conserved stems that are shorter than by chance, which include the sequence between the A and F repeat in rodents, as well as the F repeat in all eutherians tested (segment 7 is only conserved in a subset of species). Although most of the stem lengths found in the original analysis overlapped with those lengths that can arise from random nucleotide sequences, this does not signify that the conservation o f structures those regions does not exist. Instead, C A R N A C with its high specificity (true positive rate of 85-93%, but false negative rate of 33-49%) likely outputs real (but partial) common structures [82], but whether those common structures are biologically significant is a question to be addressed with experiments. Since the silencing function of the A repeat is demonstrated in the mouse and human by deletion studies [64]; and the primary sequence, copy number, and secondary structure is preserved in all eutherians examined, there is little doubt that this region is crucial to inactivate the X chromosome. This function is presumably achieved by recruiting initial players involved in heterochromatinization. Such players include G9a and Eed/Enxl complex, which lead to early histone methylation at H 3 K 9 and H3K27 on the inactive X [31, 64]. However, mutants 58 Non-Rodents B E Establishing Heterochromatin Silencing DNA binding MacroH2A recruitment; Localization Rodents A I I H D transcription/processing Differences in extent of silencing D g&r ( , H ^ ' ' - . , Differences in maintenance; Species-specific localization and binding affinity Figure 2.26. Proposed Functions of Xist Regions. Structures conserved among non-rodents are shown on the top; structures conserved among rodents are shown on the bottom; structures shared between both rodents and non-rodents are shown in the centre. Proposed functions o f conserved regions are shown in the centre. Functional differences possibly arising from structural differences in Xis t R N A between rodents and non-rodents are shown on the bottom. with G9a deficiency still display normal X inactivation or subsequent maintenance [86]; and H3K27 trimethylation by Eed/Enxl is independent of silencing [32]. Thus, the A repeat likely does not interact with these factors, but instead recruits unidentified players crucial for establishing silencing. Deletion of the A repeat in human not only abolishes the silencing function of X I S T , but appears to additionally affect stability or localization of the Xis t transcript, as only a pinpoint X I S T signal is seen on the inactive X using R N A - F I S H (personal communication, Jennifer Chow). Deletion studies in mouse also suggest that the A repeat contributes to m-localization [64]. Hence, domains of Xist seem to have overlapping roles. The exon 4 hairpin is conserved across all species, indicative of a functional role in the inactivation process. In mouse, deletion o f exon 4 does not lead to obvious changes in silencing or localization. It could instead have a role in stability or processing of the Xis t R N A [69]. Alternatively, it could bind to epigenetic modifiers important for the maintenance of inactivation status. This is compatible with the fact that maintenance was not thoroughly investigated in the exon 4 deletion study by Caparros et. al [69]. Because different mammals do not share a common secondary structure in Xist B , C and E repeats, it is unlikely that R N A folding in these regions plays a crucial functional role in X inactivation. Despite the good sequence similarity across species, the B repeat does not fold into any globally stable structures (Mfold), but rather gives many tiny stems with an overall minimum free energy [MFE] of -11 kcal/mol (data not shown). This argues that i f the B repeat indeed plays a biological role in X inactivation, the primary sequence o f this repeat is more important than its secondary structure. Perhaps it forms a necessary D N A site of Xist for trans-acting factors to recognize, rather than a single R N A motif whose shape is important to bind D N A or other proteins. Thus it might be a binding site for trans-acting proteins such as histone modifiers, D N A methyltransferases, or the Scaffold attachment factor-A ( S A F - A ) structure to locally constrain Xis t within the territory o f its expression site and prevent the transcript from drifting in the nucleus [68]. This S A F - A structure is thought to be a component important for maintenance of silencing rather than induction. Because past genetic studies did not address the influences o f Xis t deletions on extent of silencing, the B repeat may interact with any proteins involved in the maintenance stage of silencing. In contrast, the E repeat is highly diverged even at the sequence level, so its presence in different eutherians could reflect the important role of expanding Xist. Increased size could be necessary for the transcript to sufficiently coat the X chromosome. Alternatively, the E repeat 60 could prevent degradation o f the Xis t molecule by forming a protective secondary structure (the precise nature of the structure would be irrelevant). However, i f the E repeat is important for the high stability of the Xist molecule (half life is over 5 hours), deletion ofthe 3 ' end of Xist should demolish all Xist activity, as the transcript would be quickly degraded. However, mouse and human deletion studies indicate that truncation of Xist at the 3' end still allows Xis t to silence the sex chromosome, as long as the A repeat is intact, although localization is reduced [64]. The C repeat is likely dispensable due to its absence in mole and cow, the presence of only one copy in humans, and its truncation in rodents. In accordance with this, deletion ofthe C repeat in human failed to cause any apparent disruption to Xis t localization or silencing ofthe X chromosome (personal communication, Jennifer Chow). However, the C repeat is implicated in chromosome binding in mouse because peptide nucleic acid ( P N A ) interference to this region led to complete loss of Xis t localization to the inactive X [87]. Either this C repeat function is unique to rodents (as the repeat is largely expanded in rodents) or the conflicting results from deletion versus P N A studies simply reflect different experimental methods. The D repeat is distinct between rodents and the other eutherians in terms of both sequence composition (as depicted in dotplots) and structure (as predicted by C A R N A C ) . Deletion studies in mouse illustrate that both D and E are important for macroH2A localization to the inactive X , which explains why Xis t expression causes macrochromatin body formation [88]. However, it is unclear whether these repeats recruit macroH2A to the inactive X in human and other non-rodents. Since regions within the D repeat are diverged between rodents and non-rodents, the domain required to interact with macroH2A could lead to differences in the maintenance of silencing, which is the stage of X inactivation during which this histone variant is recruited. This may translate into disparities in the extent of silencing of the X chromosomes in rodents compared to non-rodents. The copy number of repeat A (around 8X) is conserved in the mammals studied, compared to B , C , D , and E which are differentially expanded or shrunken in the various species. In mouse, various parts of the 3' end and C , D and E repeats have been experimentally shown to function cooperatively for localization [64]. This justifies the variation observed across eutherians in repeat copy numbers: the shrinkage of one of the repeats is not detrimental to Xis t function since it can be compensated by the expansion of the other two repeats. The redundancy of the C , D , and E repeats might also explain why human XIST is functional for 61 localization in a mouse background [76]. Perhaps, the divergence of C , D , and E repeats at the sequence and structural levels accounts for a small degree of species selectiveness - how well the transcript localizes when it does. This could account for the differences between human and mouse in XIST/Xis t binding affinity to X chromosomes, as observed in M-phase, as well as the fact that human XIST localizes only partially to the mouse autosome from which it is transcribed [76]. Lastly, the conservation ofthe repeat F structure using C A R N A C , despite differences in monomer copy number, implies that the single structure from the combined sequence itself is more important than the reiterated structures formed from independent copies. Results from the randomization control reveal that the F repeat is more likely to form longer stems by chance than in the original Xist context. The genomic composition in this region may inherently counter the formation of larger stems. This feature may have been selected for during evolution i f the region is a useful site for binding D N A . The same idea applies for the sequence between the A and F repeat, segment 8 of the D repeat (unique to non-rodents) and the D core (present in five of seven eutherians) because they led to smaller conserved stems compared to chance. The problem with a bioinformatics approach is that one can never be certain of the biological significance of the output unless the candidate regions are later tested experimentally. R N A conservation analysis tools are currently not at the stage to account for interaction between domains and transient structures of the R N A . Additionally, the current tools available publicly have at least one o f the following problems: computationally complicated, long running time, unable to deal with large size, cannot detect pseudoknots, cannot deal with diverged sequences, and/or no quantitative output. Unti l all o f these factors can be addressed, ultimately experimental approaches are best suited to define functional domains. However, since it is not feasible to perform shotgun or progressive deletions in R N A s of such a massive size in all eutherians of interest, bioinformatics provides a tool to narrow down regions of potentially functional importance and to look at many animals at the same time. Once a crude picture o f functional domains is drawn, transgenic studies using Xist/XIST c D N A constructs in embryonic stem cells or somatic cell hybrids w i l l reveal the importance of these regions. Additionally, one can test the binding affinity of a defined region of Xist to a candidate interacting molecule by mobility shift assay. Pull-down experiments using biotinylated Xis t R N A can reveal important interacting players in X inactivation. One could also replace the repeats of one species with the same segment in another species to test whether the repeats are 62 interchangeable. Even i f the results of this present research have not uncovered additional regions of conservation, it has undoubtedly served the role of promoting the need for algorithms better suited to the needs of large and divergent functional R N A s such as Xist . 63 Chapter III Comparative Survey of Inactivation Status in Multiple Eutherians 3.1) Introduction to Genes that Escape Inactivation 3.1.1) Origin of Mammalian Sex Chromosomes The mammalian sex chromosomes likely differentiated from an ancient autosomal pair 300-350 mya, shortly after the avian-mammalian divergence (Figure 3.1) [89]. These autosomes are independent from those that gave rise to the avian sex chromosomes as evident in comparative mapping studies. The region proximal to human X p l 1.23, otherwise known as the X conserved region ( X C R ) , which is present on the X chromosome in all mammalian classes ~ prototherians, metatherians, and eutherians - is predominantly syntenic to chicken chromosome 4p, rather than to the avian sex chromosomes [90]. Conversely, the Dmrtl gene which is on the Z sex chromosome in birds is homologous to a region on human autosome 9, rather than to the human sex chromosomes [91, 92]. According to the leading theory of mammalian sex chromosome evolution, the proto X and Y began to differentiate when one allele o f the proto-sex chromosomes acquired a sex-determining function. This was possibly achieved by a dominant mutation in the SOX3 gene, changing it into a new penetrant SRY allele. SRY must have emerged after the prototherian-therian divergence because sex-specific 57? 7 is not found in monotremes or other vertebrates (but SOX genes are) [93]. Rearrangements such as inversions that could suppress recombination on the proto-Y were positively selected because this allowed genes evolving similar sex-specific roles to be inherited on the same chromosome. This recombination suppression allowed the proto-Y to be genetically isolated and to differentiate from the proto-X [94]. A t least four major rearrangements on Y have occurred through time, based on footprints left from the evolutionary process on present day human sex chromosomes, in the form of distinct strata. Lahn and Page (1999) grouped genes on the X into four regions ("strata") based on their percent X - Y divergence, which reflected the time each region had to accumulate mutations. The most diverged region was coined stratum 1, which is the oldest part of the sex chromosomes (the first to differentiate). From their study, they identified three other strata, with 64 Ancestral autosomes Emergence of sex Recombination Further Addition of 300-350 my chromosomes suppression and Y attrition autosomal 290-320my Y attrition 130-170 my material 230-300my Eutherian X Inactivation 1 and 2 = X C R Inversion spanning X A R and X C R Figure 3.1. Evo lu t ion o f M a m m a l i a n Sex Chromosomes. Mammal ian sex chromosomes l ikely derived from an ancestral pair o f autosomes where the emergence o f a sex determining allele on the proto-Y would favor recombination suppression mechanisms such as inversions on the Y chromosome. It is thought that at least four such rearrangements (labeled 1-4) have occurred to al low for genetic isolation o f the Y and sex chromosome differentiation. Decay o f the Y necessitated a dosage compensation mechanism [89, 94-96]. X A R = X added region; X C R = X conserved region; P A R = Pseudoautosomal region. 65 stratum 4 being the most recent addition onto the X chromosome, also known as the distal X p pseudoautosomal region 1 ( P A R I ) , named for its retention of highly homologous X - Y gene pairs, which are able to synapse, recombine, and be inherited in a manner similar to autosomal material. Based on comparative locations across species, Lahn and Page further estimated that stratum 1 is 300-350 my old, stratum 2 is 130-170 my old, stratum 3 is 80-130 my old, and stratum 4 is 30-50 my old. Thus, strata 1 and 2 correspond to the X C R , because they existed before the time of prototherian-therian divergence and thus exist on the X chromosome in all mammalian subclasses. However, both stratum 1 and stratum 2 contain exceptions that are found on the eutherian X chromosome yet are autosomal in marsupials and monotremes. This is because at least seven stratum 1 genes were independently translocated to autosomes in the monotreme lineage and an inversion on the X chromosome involving stratum 2 presumably occurred after material was independently added in the eutherian lineage (Figure 3.1). The oldest X - Y gene pair corresponds to SOX3/SRY in stratum 1, in accordance to the sex chromosome evolution model. Comparative mapping to the chicken genome has revealed that the X C R stratum 1 is syntenic to chicken 4p, while X C R stratum 2 maps to a variety of chicken chromosomes (predominantly 1, 4, and 12). On the other hand, strata 3 and 4 constitute recently added material, before and after the metatherian-eutherian divergence, respectively [95]. They correspond to the X added region ( X A R ) , distal to X p l 1.23, not found on the X chromosome in monotremes [96]. The majority of the X A R is syntenic to chicken chromosome l q , with the exception of the pseudoautosomal regions [90]. Different autosomal material has been independently added to the pseudoautosomal regions in separate eutherian lineages, as evident in the differing gene compositions of P A R I across species [97]. Without the potential benefit o f recombination to reverse the effects of deleterious mutations that accumulated on the proto-Y, the Y began to shrink because mutated, functionless genes could become deleted without consequence. Indeed, the current human Y chromosome is enriched for pseudogenes. However it is also abundant in large palindromic repeats, which are postulated to provide an alternative to recombination (i.e., gene conversion), necessary to prevent complete degradation of the Y [94]. The abundance of mutations and decay that occurred during evolution allowed the Y to slowly diverge from the X . Loss of functional genes on the Y chromosome resulted in dosage imbalance between females with two X 66 chromosomes and males with only one X , which necessitated X inactivation as a dosage compensation mechanism (in mammals) to cope with detrimental differences between genders. 3.1.2) Genes that Escape X Inactivation Escape as a Consequence of Sex Chromosome Evolution The expression status along the inactive X chromosome in females broadly reflects the time o f X - Y divergence, as less divergence corresponds to intactness o f the Y homolog and dosage equivalence between males and females (Figure 3.2). Genes on the newer strata would expectedly escape inactivation in females, while those on the older strata would inactivate. Consistent with this model, most escapees reside on the short arm of the X chromosome, and the escape pattern diminishes from the short arm to long arm ( X C R ) , where the majority of genes are subject to inactivation [98, 99]. Genes whose Y homologs have been retained would be expected to escape inactivation, and those whose Y homologs have decayed or have become non-functional would be expected to inactivate. This hypothesis has been verified for at least three loci by examining their methylation status in a wide variety of eutherian mammals, whether the genes are located on the X C R or X A R . RPS4X is unique in that it appears to only escape in primates, who have retained their functional Y homologs. On the other hand, Zjx seems only to be subject to silencing in rodents with essentially no Y homolog expression in somatic tissues because Zfy has become testis-specific in this lineage. A s expected, Aldl, which is on the X C R , with no Y homolog in any eutherians examined, was consistently subject to inactivation [100]. However, JaridlC did not show any discemable pattern of escape or inactivation in the eutherians [100]. Numerous genes on the human X do not possess Y homologs but similarly escape inactivation in human females [99]. These genes might confer female specific traits, such that two doses distinguish a female from the single dose in males. Marsupials seem to require a double X chromosome dosage to stimulate some female characteristics while inhibiting some male features [2]. This is likely achieved by genes without Y homologs that escape inactivation, allowing for sex differences. Alternatively, dosage differences may be inconsequential at some loci . Lastly, it is l ikely that levels o f regulation exist beyond transcription that might alter dosage effects at the transcript or protein level, such that escapees of X inactivation do not 67 Human Mouse 100% PARI chick 1q chick distinct origins chick 4p PAR2 0% L PAR 1: CSF2RA, IL3RA, ANT3 (all E) (E) STS (E)ZFX (E)EIF2S3 (E) CRSP2 (E) UTX (E) UBE1X (E)PCTK1 (E) JARID1C *(E) RPS4X XIST % Escaping genes Per region PAR (E) Sts (E)Mid1 (E/l) JaridIC Xist, Enox (E) (I) Rps4x (l)Zfx* (E) Eif2s3x (E) Dbx (I) Timpl (I) Ubelx (E) Utx PAR 2: SYBLI(I), IL9R(E) 15% genes on Xi escape inactivation 'unique escape or inactivation status 5% genes on Xi escape inactivation F i g u r e 3.2. E x t e n t o f G e n e s t h a t E s c a p e I n a c t i v a t i o n i n H u m a n v e r s u s M o u s e . E= escape status; 1= subject to inactivation; X chromosome on left and Y chromosome on right o f each pair. A) The leftmost graph shows that the number o f genes that escape inactivation increases towards the distal p arm o f the X chromosome. Origins o f the regions on the X are indicated [90, 105]. B ) The seven genes that are known to escape inactivation on the mouse inactive X are shown. 68 translate to differences in eventual product concentration. Consistent with this, analysis of male versus female transcriptomes failed to detect over-expression of many genes in females observed to escape inactivation in other studies [101]. Nevertheless, a high number of genes located within the X C R , especially in the short arm, that do not possess Y homologs still escape inactivation and this number seems to be greater in human versus mouse. Human vs. Mouse: To Escape or not Escape? In general, the mouse X chromosome displays more thorough inactivation compared to the human counterpart. To date, only seven examined genes are known to escape silencing: Enox, Utx, Midi, Jaridlc (formerly Smcx), Dbx, Eif2s3x, and Sts (with Sts being a P A R I gene) (Figure 3.2). The small number of loci that have been studied in mouse might not represent the entire mouse X chromosome. However, the absence of an abnormal phenotype in 39 ,XO mice compared to the Turner phenotype in 45 ,XO humans suggests incomplete silencing o f the human X relative to the mouse chromosome. Consistent with this, an extensive expression analysis o f 624 X-l inked genes and ESTs confirmed that over 15% of the human loci escape inactivation and an additional 10% show variable expression from female to female [99]. Cattle monosomic for the X chromosome also show Turner syndrome, developing only streak gonads, improper ovaries and no ovulation. Analogous to human, the X X Y cow displays Klinefelter syndrome, with small testes and sterility [102]. Therefore, the escape pattern in cow resembles that in human and contrasts with that in mouse. A n exploration into other mammals may reveal the factors that explain the observed diversity across eutherians. Other Considerations of Escape When comparing human and mouse, where the retention of Y homologs can differ for the same locus, it is unclear what allows the silencing signal to skip the same locus in one species but not the other. Sequence comparisons between the 5 ' C p G islands o f ZFXIZjx fail to reveal any significant differences in sequence and structural elements [103, 104], arguing against cw-elements within the gene as being important regulators for expression. Delineating the factors that influence the expression of X- l inked genes and finding species differences in these factors are the focus of Sections 3.2 - 3.4. 69 In humans, the presence of blocks of escapees suggests domain-regulation of X- l inked genes. The human escape cluster in X p l 1 contains a region of distinct chromosomal origins (chicken chromosomes 1,4 and 12) relative to the rest of the X C R (chicken 4p), including a number of X A R genes [90,105]. This region corresponds to a portion of the X that presumably underwent a minor inversion spanning both the X C R and X A R (Figure 3.1). A comparative analysis o f the corresponding block in mouse shows a lower level of long terminal repeats (LTR) in humans than mouse, suggesting that L T R s may enhance silencing in mouse [106]. C T C F insulators have recently been discovered at transition points between loci subject to inactivation and those that escape [107], which may help maintain open chromatin domains within larger heterochromatic contexts. Gene-specific regulation of X inactivation is evident on the mouse chromosome, as a small number of genes escape silencing despite being surrounded by larger domains subject to inactivation. However, analysis of the mouse JaridlC promoter has not revealed any characteristics of this gene that could explain its unique escape status [106]. It is unclear whether escape status on the inactive X chromosome arises due to lack of maintenance factors for stable inactivation, or an initial resistance to Xist-dependent silencing (the concept of "precommitment"). The mouse Jaridlc gene, which is initially subject to inactivation and then reactivates early in development [108, 109], illustrates that reactivation leads to escape in some situations. In addition, females with ICF syndrome, associated with hypomethylation due to mutations in the D N M T 3 B D N A methyltransferase, show abnormal escape in a portion o f cells [110]. Although this implicates a role for methylation in maintenance, it also demonstrates that the absence of methylation is insufficient to reactivate X -linked genes in all cells. TIMP1, which shows variable low-level expression despite methylation o f its promoter, demonstrates that methylation is not enough to maintain a consistent inactivation status. However, TIMP1 may be predisposed to becoming expressed by the acetylation of histone H 3 K 9 , a mark found on both naturally expressed TIMP1 genes and those successfully induced to be expressed [111]. Thus, it appears that escape might dualistically involve both pre-commitment and maintenance. L o c i may inherently bear properties that make them more prone to reactivation later. Monoallelically expressed genes, including imprinted genes and genes that are subject to inactivation, have H 3 K 4 dimethylation restricted to their promoters, whereas biallelically expressed genes show the same such marks in 70 exonic regions as well as their promoters [112]. The additional mark in the exon may predispose these genes to being expressed rather than silenced. Interestingly, these differences in modifications are most drastic in undifferentiated stem cells compared to differentiated fibroblasts, highlighting their role in marking the gene status prior to inactivation. 3.1.3) Differences Between Species : Imprinting, Methylation, and Escape Unlike the random inactivation in all tissues of humans, paternal X chromosome inactivation occurs in all tissues of marsupials and monotremes, as well as in the extraembryonic membranes of rodents and cows (reviewed in [48]). The non-random inactivation pattern observed in marsupials has been attributed to early cleavage events that prevent erasure of imprints on the paternal X chromosome after blastulation (reviewed in [113] and see Section 1.3) (Figure 3.3). In mice and in cattle, the early stage cleavage occurs later than in marsupials. Paternal inactivation is observed only in extraembryonic lineages in these two species. The first cleavage stage in humans is comparatively late - this presumably allows more time for the erasure of imprints to occur before new epigenetic marks on the future inactive X are established randomly in all tissues [25, 48,113]. Humans/primates are thus unique in terms of not normally displaying imprinted inactivation. In eutherians, methylation of C p G islands on the inactive X maintains the silencing signal. Methylation is conserved within Eutheria, associated with inactivation in diverged species such as the coast mole [84]. However, marsupials do not show a clear association between methylation and silencing, consistent with the incomplete, unstable, tissue-specific inactivation observed in metatherians [2]. Because late replication timing and histone deacetylation are associated with silencing in metatherians, but methylation of C p G islands is not, D N A methylation is likely a more recently evolved maintenance mechanism of repression [18]. Like marsupials, incomplete and unstable inactivation is observed in the early embryo o f mouse before the blastocyst stage; and Jaridlc in mouse is variably expressed depending on the stage of development and tissue-type [25,108,109]. In humans, the loci that show variability between females [99, 108, 109, 114] might reflect unstable inactivation due to absence of methylation, as variable loci tend to lack C p G islands [99]. Thus it possible that there is a progression from unstable, variable, partial inactivation in the early branching mammals to variable, incomplete inactivation seen in humans, to variable, nearly complete inactivation 71 Prototherians Metatherians Human Cow Mouse Imprinting Paternal; tissue specific Paternal; tissue specific Random Paternal in extraembryonic; random in embryo Paternal in extraembryonic; random in embryo Tsix Functionality 7 ? Tsix Tsix Tsix functional Presence of Xist ? ? + + + Features of Inactivation Unstable; tissue variable; late replication seen Unstable; lack of methylation; tissue variable; late replication and histone deacetylation Stable but can be variable between individuals; late replication, histone deacetylation and DNA methylation Stable, unknown variability; late replication, histone deacetylation and DNA methylation Stable but can be variable depending on developmental time; late replication, histone deacetylation and DNA methylation Sex Chromosome Pairing X and Y synapse (large region) X and Y pair, no PAR (small region via modified axial elements) X and Y pair at PAR X and Y pair at PAR X and Y pair at PAR _ Extent of Silencing Incomplete Incomplete silencing Many escapees ? Mostly silent Xist Binding Affinity ? ? Xist binds until prophase ? Xist binds until metaphase F i g u r e 3.3. Differences between M a m m a l s . Comparisons between prototherians, metatherians, human, cow and mouse in terms o f cleavage time, imprinting, X i s t and Ts ix functionality, sex chromosome pairing, and silencing characteristics. observed in mice. The mouse X chromosome might represent the eutherian X at an advanced state of acquiring epigenetic modifications that increase the stability and completeness o f silencing. A n examination of other eutherians w i l l clarify whether the silencing pattern in mice is unique. 3.2) Introduction to Methylation Analysis Not only is the initial and fundamental step of Xist coating the eutherian X chromosome in cis poorly understood (Chapter 2), but the mechanism by which this results in silencing along the X chromosome has similarly been ambiguous. In particular, an understanding o f why and how some genes escape inactivation, and furthermore, why there is a species difference of inactivation status for the same loci would provide insight into the mechanisms of regulating silencing in different mammals and genes. Comparative studies in multiple eutherians could reveal factors important for silencing that are discrepant in human and mouse, and clarify the roles of previously hypothesized factors in the extent of inactivation. In this study, I examine whether the presence of Y homologs, distance of the loci from the XIC, age of the region on the X , proximity to constitutive heterochromatin, and different generation times influence the escape status o f X-l inked genes in different species. 3.2.1) Generation Time The rapid generation time ofthe mouse relative to human might explain the greater extent of inactivation. Because the number of mutations increases with a greater number of replications, those species with shorter generation times would expectedly inherit more mutations per unit time than those with longer generation times. Thus, mouse sex chromosomes could represent more mutated forms relative to the human X and Y , exhibiting a higher amount of Y decay and consequent greater need for more extensive inactivation along the X chromosome. Using this logic, rodents with shorter generation times would be expected to show more complete inactivation along the X chromosome, compared to insectivores, artiodactyls, and primates with longer generation times. However, there has been controversy over whether the generation time hypothesis holds true. Studies from over a decade ago suggest that synonymous mutation rates are consistently higher in the mouse compared to the cow and 73 human, in the genes investigated [115]. On the other hand, a later D N A / D N A hybridization experiment followed by relative rate testing, using artiodactyls with similar biology/metabolic rates and different generation times, found no evidence o f a greater accumulation o f nucleotide changes in the species with shorter generation times [116]. Furthermore, a computational analysis o f 5669 genes (17208 sequences) in 326 eutherians for mutation rate differences led to the conclusion that mutation rate is approximately constant per year across lineages and largely similar among genes [117]. Based on those results, the authors argued that overall mutation rates are influenced by factors that play larger roles than D N A replication errors in germ cells. Yet, a recent assessment of greater than 700,000 bp of full-length c D N A in human, pig, and mouse, found that mouse and pig showed 1.44 and 2.86 as many synonymous substitutions as human, although the rates of non-synonymous mutations were similar [118]. In addition, Margulies et. al (2005) found that the substitutions per site were higher in rodents compared the hedgehog (belonging to the same mammalian order, Insectivora, as the coast mole), cow and human [3]. A major criticism given by authors of the above computational analysis was that studies prior to it used either a small number of genes or a small number of species [117]. A large number of genes is important because only a fraction (~15%) of positions in a sequence are four-fold degenerate (expected to harbor only synonymous substitutions), which is necessary to test mutation rates in the absence of selection, according to the nearly neutral theory o f evolution [117]. Because the Margulies et. al (2005) study looked at a large number of placental mammals and sequences, it seems to have rebuked this argument, showing that mutation rates vary in different eutherian lineages, with the mouse showing branch lengths longer than other mammals investigated [3]. Therefore, it remains a valid question whether the overall proportion of genes that are expressed on the inactive X chromosome decreases with species that have short generation times. 3.2.2) Constitutive Heterochromatin Because the mouse X chromosome is acrocentric (some say "telocentric"), whereas the human homolog is instead metacentric, the presence of constitutive heterochromatin in the middle o f the human X might hamper the spreading o f X I S T , resulting in more abundant escape in proximal regions as well as the distal X p located opposite from the XIST locus. Indeed, Xis t coverage is absent in G-dark metaphase bands, demonstrating that the R N A transcript has 74 preferential affinity for non-constitutive heterochromatin [119]. In addition, intercalary constitutive heterochromatin between the autosome and X is selected for in species that have acquired X;autosome translocations, presumably because it prevents silencing from spreading to the autosome [120]. Since the mouse centromere is located distally, whereas its Xist locus resides near the centre of the chromosome, this could facilitate silencing spread of the mouse Xis t by allowing accessibility to appropriate binding sites on the sex chromosome. I f this were the case, then mammals with acrocentric X chromosomes (rat, river buffalo, sheep) [121] should show more complete silencing than mammals with metacentric sex chromosomes (coast mole, cow, human). 3.2.3) Distance from the XIC Better spread of inactivation correlates with proximity to the XIC, as demonstrated by both the gradient effect in X;autosome translocations [59, 122] as well as in mouse early embryos, before inactivation becomes stable [25]. Thus, regions that have a predisposition to not be silenced or to be reactivated might have an increased chance for escape i f they are situated at a greater distance from the Xist locus. The locations of the genes on the X chromosome differ between species, which allows for correlational studies of distance from the XIC with expression status of the same loci across eutherians. For a gene located proximal to the Xist locus in one species, but distal to the XIC in another species, the gene would be expected to be subject to inactivation in the former species but to escape inactivation in the latter. 3.2.4) Evolutionary Age and X/Y Divergence The age of the different regions on the X chromosomes generally correlates with inactivation status; the older regions being more prone to silencing because they are less likely to contain functional Y homologs and/or more likely to accumulate way-stations ( if they do exist). However, on the human X chromosome, in addition to the genes that escape inactivation in the newer X A R on the short arm, escapees also reside in the X C R of the short arm (stratum 2), as well as the long arm X C R [99]. Interestingly, the genes that are devoid o f silencing in stratum 2 are syntenic to distinct chromosomal origins (chicken 1,4, 12) than the rest of the 75 X C R (chicken 4p) [90,105]. One human escape cluster at X p l 1 includes genes derived from both the X conserved region ( X C R ) and X added region ( X A R ) , so the relationship between how long (evolutionary time) these genes have spent on the X chromosome and why they escape inactivation is unclear. Because a significant portion of genes that escape inactivation in humans also do not possess Y homologs, escape cannot be strictly for appropriate dosage regulation. A n examination into other species w i l l reveal in which cases the presence of Y homologs correlates with escape status. The purpose of the present study is to use different species to test the expression status of genes that show discrepant status in human and mouse, in order to distinguish whether the incomplete inactivation is unique to primates or is representative of other eutherians. This w i l l contribute to our understanding of what characteristics lead to discordant silencing status across these species, and whether these same characteristics help to predict whether a gene is prone to inactivation or escape. In addition to using human and mouse as comparative controls, two other eutherians, the cow and mole, have been chosen based on the availability o f both female and male cell lines in the lab, which is necessary for D N A extractions for the assay. Given that the mole is a member of Insectivora and the cow belongs to Artiodactyla, their distributed positions relative to human and mouse on the phylogenetic tree also make them useful species for representing distinct eutherian lineages. Because methylation at the C p G islands of genes is associated with silencing, and this D N A modification is conserved within Eutheria, methylation analysis was used to assay for inactivation status. 3.3) Results The experiments were conducted using the methylation-sensitive enzyme Hpall, where the presence of a band after restriction enzyme digestion followed by P C R amplification suggests the presence of methylation at the tested locus. This reflects a silent state on the inactive X , whereas the absence of a band indicates hypomethylation associated with expression from the inactive X . A s a control to ensure that the absence o f a P C R band reflects lack o f methylation at the C p G island rather than the mere absence of D N A in the digest, a mock control was performed under identical conditions using glycerol/buffer solution in place of the enzyme and buffer mix. This also served to show the band size for the species in question. Absence of a band in this mock control was either due to poor optimization of primers or bad 76 primer design, or the degradation of D N A in the digests. When possible, P C R conditions were adjusted, new primers were designed, or new digests were made. To ensure that the presence of a band in the Hpall digest reflects methylation at the C p G island of interest rather than incomplete restriction enzyme digestion, a Mspl cutting control was used. Because Mspl recognizes the same D N A site as Hpall but is not methylation-sensitive, performing this digest under identical conditions as Hpall first confirms that the locus being amplified contains a Hpall site, and second allows for direct comparison of the extent of digestion between the two enzymes. Remnant bands in the Mspl control represented insufficient time or poor cutting conditions leading to incomplete digestion, or reflected an excessive number of cycles for the P C R reaction. Samples were either redigested or the number of cycles was decreased to counter these problems. Degenerate primers were designed for those loci o f interest (Table 3.1) possessing C p G islands lying within 2 kb from the 5' end (Santa Cruz genome browser) for amplification across the four eutherians. Methylation digests were performed on female and male coast mole, cow, human, and mouse fibroblast-derived D N A , and three sets of reliable primers were used for P C R amplication of non-CpG island loci to make sure that all digests were positive for D N A (Figure 3.4). For digests that showed weaker P C R amplification, the amount of template was adjusted until all samples showed equal intensity o f P C R bands. The established amount of template was then held consistent for all P C R reactions to amplify X- l inked C p G islands o f interest. For the methylation analysis, I have chosen two loci that reside within the long arm o f the X chromosome in the X C R , which are subject to inactivation in both humans and mice, to test in the cow and coast mole. The purpose of this was to confirm that genes that are silent in both mouse and human show similar status in all eutherians. Indeed, for all species tested, both Fmrl and Aral displayed a band in each of the mock female and male controls, along with each of the female Hpall lanes, but not in the male Hpall or any o f the Mspl digests (Figure 3.5). These results suggest the presence of methylation at Fmrl and Aral C p G islands in the eutherians tested, as expected for loci that are part of the evolutionary older X , where there is a general depletion of functional Y homologs (methylation results are summarized in Figure 3.6). To address the idea of unique escape status in human, JaridlC and Ubel (short arm X C R , stratum 2) which escape silencing in human but not in mouse were tested in the other two mammals. For JaridlC, bands were observed in the female mouse and cow Hpall lanes, but not 77 Table 3.1. Degenerate Primers Used for Methylat ion Analysis at C p G Islands of X- l inked Genes. Optimized primers work for cow, mole, human, and mouse genomic D N A . Primer Pair Gene Sequence 5' to 3' MgC12/Betaine Cyc l ing Conditions Cycles R . E . Sites Flanked Product Size C R S P 2 - C p G F C R S P 2 - C p G R Crsp2 T A C A G R G G G C R G M G G T G A G R A G G G R C G C S C G C C T C A A Y T G C R C C G A R T A C A A 1.5mM 1M Betaine 94-1 min 54-1 min 72-2min 4 0 X Hhal (IX) HpaU (3X) 375bp U T X - C p G F U T X - C p G R Utx C C T C G Y G G A G G C Y A T T A T T T C Y A G C C A G A G A A T G R A G G G T C C V G G C Y G K G T C Failed Hhal (8X) HpaU (5X) 354bp U T X - C p G F U T X C p G R 2 Utx C C T C G Y G G A G G C Y A T T A T T T C Y A G C C C S A R W G G S A G C W K S Y K G T T A G G T T G Failed Hhal (8X) HpaU (5X) 403bp U T X C p G R S U T X C p G F 2 Utx C C T T C R T Y C T G G C G C C A T C T T C A T G A Failed Hhal(\X) HpaU (3X) 240bp U T X C p G F 3 U T X C p G R 3 Utx G G T G A T G A G G R A A A G A A A A T G G C G C C T T C R T Y C T G G C G C C A T C T T C A T G A 1.5mM 1M Betaine 951 min 54-1 min 72-2min 3 5 X Hhal(\X) HpaU (4X) 368bp U B E l - C p G F U B E l - C p G R Ubelx A T G A T T C A T R A R T R G G C G C G G G G T M Y G A Y B Y C A A G G T C A G A T T T 1.5mM 94-1 min 56-1 min 72-2min 3 5 X Hhal (4X) HpaU (7X) 544bp U B E l - C p G F U B E l C p G R 2 Ubelx A T G A T T C A T R A R T R G G C G C G G G T T C T G A Y M K G M R A T R C A W G G Y T C D G G A Failed Hhal (5X) HpaU (6X) 630bp P C T K l C p G R 3 P C T K l C p G F 3 Pctkl G T G C C A G T A G T C T T C R G C C A T T T T G G T A C A Y G C A G T C C G A G G T G A 1M Betaine (Human, Mouse) 2 M and 3 M Betaine (Cow, Mole) 95-1 min 54-1 min 72-2min 3 5 X Hhal (9X) HpaU (3X) 350bp P C T K l - C p G F P C T K l - C p G R Pctkl G G A G G A R A A G G A G G T C G C G C G C T D C C C A W C C Y C A G C T C C Y A G R M C Failed Hhal (3X) HpaU (4X) 329bp P C T K l - C p G F P C T K l C p G R 2 Pctkl G G A G G A R A A G G A G G T C G C G C G T C G V G G A C R C G C T C A C C G G M G Optimizing Hhal (2X) HpaU (3X) 132bp . Degenerate Primers Used for Methylation Analysis at C p G Islands of X- l inked Genes (Continued.. .) F R A X 2 F R A X 7 Fmrl G C T C A G C T C C G T T T C G G T T T C A C T T C C G G T A G C C C C G C A C T T C C A C C A C C A G C T C C T C C A 2 m M 2 M Betaine 94-1 min 50-1 min 72-1 min 3 5 X Hhal (2X) Hpall (3X) 250bp-400bp depending on snecies N E M O C p G F N E M O C p G R IkBkg T A Y G A C A C C G G A A G C C G G A A G A A R A R G A C C A C A C C T G T C A G C A G Untested Hhal (2X) Hpall (3X) 165bp Z F X 1 Z F X 2 Zfic G A G C T C G G A G C T G A C A A A A A C T A C C C T T C C G C A T T T T C C T ' 2 m M 2 M Betaine 94-1 min 50-1 min 72-1 min 3 5 X Hhal (2X) Hpall (3X) 105bp S M C X 1 S M C X 2 JaridlC C C T C G G G C C C A C C A T G G A C T G A T T T T C G C G A T G T A G C C 0 .5mM I M Betaine 94-1 min 56 - lmin 72-1 min 3 5 X Hhal (2X) Hpall (2X) 117bp E1F2S3F E I F 2 S 3 R Eif2s3 C C T B A G Y V T T G C C T R C M C A R A W A T C T C A C T C C A S C Y T C D C C S C C M G C C A T Failed Hhal ( I X ) Hpall (2X) 240bp C O W A R F O R C O W A R R E V Ar T C G A G T G C A G C A C C T T C C G G C G G G T G C C G G C C T C G C T C A G G A T G T 1.5mM 95-5min 95-1 min 62-1 min 72-2min 4 0 X Hhal (4X) Hpall (4X) 380bp E 6 R C M E 6 R (Mole D N A Control) Xist G C A G A G A C A C T G A A G C A C A C A A T G T C T T A C C C A T T T C C A T G A T T C 1.5mM 94-1 min 54-1 min 72-2min 3 5 X - 430bp m U B E l A m U B E l B (Mouse D N A control) Ubelx A G C T G T G C T G C A A C G A T G A A G T C T T G A G G T T G C T G G G T A 1.5mM 94-1 min 56 - lmin 72-1 min 3 5 X 196bp C M X I S T 1 2 C M X I S T 1 3 (Human and C o w D N A Control) Xist T T C T C A G M A G T K C T G G C A C A T C T G T T C T T T T G A G A T G T M C T T T T T G A T G T T 1.5mM 94-1 min 52-1 min 72-3min (Human) 94-1 min 49-1 min 72-3min (Cow) 3 5 X ~400bp (Human) ~300bp (Cow) HUMAN COW MOLE MOUSE Figure 3.4. Control PCRs for Methylation Analysis. The equal presence o f X chromosome D N A for all digested and mock-digested samples was checked using independent primers from those used in the methylation survey. Cow, human, and coast mole digests were amplified with a degenerate pair of Xist primers; mouse digests were amplified using mouse Ubel primers that d id not surround HpaU cut sites. 80 Figure 3.5. Methylation Analysis of X-linked Loci in Four Eutherians. The genes and their respective locations are shown on the left in relation to the human X chromosome. The corresponding gels are shown on the right for mole, mouse, cow and human. Each panel includes the mock, Mspl, and Hpall reactions for female and male as labeled. X added region ( X A R ) is in green, while X conserved region ( X C R ) is in blue on the human X chromosome. 81 M b 23.9 40.3 44.5 46.8 ZFX CRSP2 UTX UBE1 53.1 MJARIDIC 66.5 I AR 146.7 I \FMR1 H U M A N M O U S E C O W M O L E E E E E I + + I Testes I -E + I Testes E + E -E + E -E + I/E + I/E I E ? E -E + E -I " I Figure 3.6. Summary Diagram of Methylation Analysis Results. Mouse shows unique inactive status compared to the non-rodent eutherians in the X A R and most distal X C R . The proximal X C R shows variable status among two eutherians and escape status in the others. I = inactive; E = escape; +/- = presence or absence o f Y homolog, respectively. Testes = testes-specific expression o f Y homolog. The location o f the loci are indicated on the left, in M b from the p terminal o f the human X chromosome. 82 in the corresponding lanes in human and mole, when mock controls were positive and Mspl digests were negative. These results indicate that the mouse and cow JaridlC genes are subject to inactivation, whereas the mole and human genes show escape. However, when using D N A from a different female cow (IVF), the absence of a band in the Hpall lane suggests that the D N A is unmethylated and JaridlC escapes inactivation (the result was replicated three times) (Figure 3.7). For Ubel, the gene escapes silencing in both human and cow, but is subject to inactivation in mouse. The status at this locus is unknown for the coast mole, as these primers failed to amplify a specific band of an expected size, but instead produced multiple non-specific bands under a variety of conditions. In addition, even after many redigestions, bands were seen in the mole male Hpall digests but not in Mspl lanes. I f any o f these bands represent the Ubel locus, the C p G island is normally methylated in both males and females. In humans, there is no X-l inked locus known to date which is normally inactive in both males and females, other than Xist. The closest situations are SYBL1 and HSPRY3, which are subject to inactivation on both the human inactive X and Y [43, 99]. To test whether mouse shows unique inactivation, two loci ( X A R ) - Zfx, Crsp2- that escape inactivation in human but not in mouse, were assayed. For both loci , bands were absent in the coast mole, cow, and human Hpall lanes, suggesting hypomethylation at the corresponding C p G islands. This pattern indicates an escape status in coast mole, cow, and human for Zfx and Crsp2, compared to the inactivation status in mouse. Finally, to confirm whether loci with concordant active status in human and mouse are expressed on the inactive X in other eutherians, the Utx gene ( X A R ) , which escapes inactivation in both human and mouse, was tested. The coast mole and cow Hpall digests failed to amplify bands, despite obvious bands in the mock control. Thus, the Utx expression pattern in both moles and cows resembles that seen in human and mouse. 3.4) Discussion 3.4.1) Evidence of X-linked Loci in Cow and Mole To draw any conclusions from the present methylation analysis, it was important to know whether the loci tested in fact reside on the X chromosome in all four species. Confirming the locations was not a problem for human, mouse and to some extent cow, since their X chromosomes have been sequenced to entirety (Figure 3.8). However, several X-l inked 83 Female Male Female Male IVF K > H H k > H H ^ H - I H H ^ H - 1 H - I $^ Q % C L Q M H H _*s> H H '"S H H H - 1 ^ >—I H - 1 CCL 209 positive control JaridlC P C R Figure 3.7. Potential Developmentally-Dependent Expression Status of JaridlC in Cow. Methylation analysis o f cow D N A derived from in vitro fertilization (top) compared to cow D N A derived from the pulmonary artery o f a young female (bottom). D N A from the male cow cel l line C C L 2 0 7 was used in both cases. Left panels show positive control for the presence o f D N A using Xist primers amongst all digested and mock-digested samples, while right panels show JaridlC amplification o f the same samples. A \ Zfx \ Pctkl Utx Ubelx Arafl JaridlCS Rps4x Xist Fmrl C A T / Zfx / Crsp2 Utx Ubelx Pctkl Arafl \ Synl JaridlC Ar Rps4x Xist Fmrl Ikbkg Synl Arafl Ubelx Pctkl Utx Crsp2 JaridlC Zfx Ar Rps4x Xist Fmrl Ikbkg D O G R A T Crsp2 (I) Utx (E) Ubelx (1) Pctkl Arafl Synl Fmrl (I) Ikbkg Zfx (I) Ar (I) Rps4x (I) Xist JaridlC (E/I) M O U S E /ZFX (E) CRSP2 (E) UTX (E) UBEIX (E) PCTKI (E) ARAFI \SYNI JARID1C (E) AR (I) RPS4X (E) XIST FMRI (I) IKBKG (E) Fmrl (1) /Ikbkg Crsp2? (E) Utx? (E) Pctkl? Ubel? (E) Synl \ Arafl? JaridlC? (1/E) Xist (Xq23) Rps4x Ar (I) Zfx(Xq34) (E) H U M A N C O W B C A T M O U S E H U M A N C O W Figure 3.8. Comparison of Eutherian X and Y Chromosomes. A ) The relative order o f tested loci in the present methylation analysis. The loci o f interest (and their Y homologs in panel [B]) are highlighted in red. The loci indicated with (?) in cow have been mapped onto the X chromosome, but the relative positions o f these loci are unknown. K n o w n expression status o f the loci in mouse, human, and cow are indicated as (E) for escape status or (I) for inactive status. B ) Y chromosomes o f mouse, human, and cow, with cat and pig Y chromosomes shown as comparisons. The precise location o f the centromeres and size o f the chromosomes may not be accurate. Positions o f loci are not drawn to scale [124, 126-128 and N C B I ] . 85 orthologs of interest (Crsp2, Utx, Ubel, JaridlC) have not been "placed" on the cow X chromosome map and they lie at undefined positions in the cow genome ( N C B I , U C S C ) . Nevertheless, MaoA which is near Utx, and Synl which is near Ubel and Pctkl, are mapped onto the cow X chromosome according to the 2005 cow radiation map available in the B O V M A P database (http://locus.iouy.inra.fr/), albeit at unknown locations. A t present there is no sequence information on the coast mole X chromosome. However, Zoo-FISH experiments have confirmed whole X chromosome synteny in all eutherians tested [123]. This is consistent with Ohno's law first hypothesized in 1973, which states that the gene content ofthe traditional mammalian X chromosome (not including the P A R ) would be highly conserved across taxa because of strong selection to maintain dosage compensation. Examinations of gene order in rodents [124] and artiodactyls [125,126] reveal a fair degree o f rearrangement relative to the human X chromosome. The gene order of the rat X chromosome is drastically different from that of mouse, although both rodents do share the similarity of acrocentric sex chromosomes (NCBI) . In cow, Fmrl is located at X p (instead of X q as in human) [127], on the opposite arm from cow Xist (Xq23) [127], and the centromere has shifted in cattle compared to human [128]. Due to an inversion, Ar and RpS4Xare still located on the q arm of cow, similar to human [127]. According to the map in several recent papers, Zfx has been shuffled to Xq34 in cow, compared its location at Xp21 in human [127, 128]. Zoo-F I S H analysis on the common shrew (belonging to the same order Insectivora as the coast mole) again shows that the human X arm is conserved in its entirety onto a single chromosome of the shrew, although this chromosome also consists of human chromosome 2 genes and lacks human X p material [129]. Different shrews possess X-l inked Zfx, suggesting that moles which are also insectivores also possess this gene on the X chromosome (NCBI) . Recent reconstructions of the ancestral eutherian karyotype using the parsimony principle (assumes that chromosomes identical in species belonging to different taxa are likely to be present in their common ancestor) predicted that ten ancestral chromosomes were homologous to entire human chromosomes: 5, 6, 9, 11, 13, 17, 18, 20, X and Y . The ancestral karyotype resulting from other studies ranges from 2n=44 to 2n=50, resembling the karyotype of human [130]. This gives confidence that the loci investigated in this study are indeed located on the X in all eutherians. 86 3.4.2) Implications of Factors in Escape Generation Time The findings in the present methylation analysis showed that three out o f four genes expressed differently between human and mouse (Zfa, Crsp2, and Ubel) are inactivated in mouse but not any other eutherian tested (Figure 3.6). The short generation time in mouse might explain its more complete silencing. Mouse has a gestational time of three weeks, while the coast mole has a similar gestational time of four weeks, compared to the 266 days in human and 277-290 days in cow [131] (http://www.infoplease.com/ipa/A0004723.html). However, the coast mole breeds during January to March and produces a single yearly litter consisting of 3-4 young, compared to the mouse which breeds throughout the year (http://www.dfg.ca.gov/whdab/html/M017.html). In addition, female moles are sexually mature at 9-10 months compared to 2 months in mice (http://imnh.isu.edu/digitalatlas/bio/mammal/insec/mole/como.htm). In cattle, females begin to mate at 18 months of age with mating taking place throughout the year (www.oaklandzoo.org/atoz/azcattle.html). Since Zjx and Crsp2 ( X A R genes) escape inactivation in moles (with longer generation times than mice), resembling the pattern seen in cattle and humans, more complete inactivation in the X A R in mice can be due to accelerated evolution. Humans may show abundant escape compared to the other three eutherians due to their long generation time. Unfortunately, the mole Ubel expression status is still unknown. However, the escape status o f cow and human Ubel ( X C R ) strengthens the assertion that mouse, with its short generation time, displays unique inactivation status for several genes on the X chromosome, independent of age of the region ( X C R or X A R ) . Therefore, other than generation time, any mechanism that leads to increased mutation rates relative to other species, such as high metabolism (which is thought to increase the amount o f oxygen radical damage to D N A and correlates with small body sizes), also merits further investigation regarding its effects on the proportion of genes that escape inactivation on the X chromosome. Constitutive Heterochromatin The fact that Xis t does not need to traverse across constitutive heterochromatin might also explain the discrepancy between mouse and the non-rodent eutherians investigated here. 87 The simplistic view is that centromeres at the termini of X chromosomes would allow the Xis t R N A to coat and silence the chromosomes more efficiently. Cattle have a diploid number of 60 comprising acrocentric autosomes and a pair of metacentric sex chromosomes. O f mole chromosomes investigated to date, the centromeres of the majority o f the species occupy a metacentric or submetacentric position, with only one out of the eight species showing an acrocentric position (in only one pair of homologs), usually with a diploid number of 34 [132, 133]. Although the coast mole has not been karyotyped, it is likely that it also possesses biarmed sex chromosomes with centromeres near the centre, rather than near the termini. Human, cow and mole with metacentric X chromosomes all show escape at Zfx and Crsp2, compatible with constitutive heterochromatin playing a role in escape. However, in the Jegalian and Page study (1998) Zfx was shown to escape silencing even in sheep which have acrocentric chromosomes [100]. The observation that the cow Zfx gene is relocated to the X q arm but still escapes inactivation like in human, argues against the role of constitutive heterochromatin in hampering Xis t spread and subsequent silencing of this region. In addition, the position of Fmrl on the cow X chromosome, seems to have relocated from the long arm to the p arm, but again the centromeric barrier between Xist on the q arm and this locus does not seem to hinder silencing. Therefore, another factor other than constitutive heterochromatin must contribute to the species differences in inactivation. Distance from the XIC Because cow Zfx is located at a distal position (Xq34) on the same arm as Xist [127], the large separation from the Xist locus might account partly for the gene's escape status in this species. On the other hand, two out of four genes on P A R 2 in humans, located at comparable distance from XIST, are subject to inactivation on both the X and Y chromosomes. Furthermore, mouse genes whether located nearby or far away from the XIC are nevertheless stably silenced -the silencing status of genes along the mouse X chromosome seems to be uniform, without a gradient due to distance. A s mentioned before, the cow Fmrl locus is relocated to the X p arm, presumably increasing the distance between the gene and Xist (Xq23) [123]. Despite the distance, Fmrl remains subject to inactivation. These lines of evidence make it unlikely that distance from the XIC accounts for expression differences of X-l inked genes between species. 88 Evolutionary Age of the Region Genes located in the X C R on the short arm of the human chromosome (stratum 2) -JaridlC and Ubel- that escape silencing in human but are subject to inactivation in mouse exhibit a variable pattern in the mole and cow. For JaridlC, expression in cow is similar to in mouse (inactive); whereas expression in mole is similar to in human (escape). A s for Ubel, at least in one other mammal, the cow, the gene escapes inactivation analogous to in human. Therefore humans do not appear to be unique in allowing the expression of loci on the inactive X located within the short arm X C R . It is interesting to note that stratum 2, which contains JaridlC and Ubel, may not have been part o f the original ancestral pair of autosomes from which sex chromosomes evolved (syntenic to chicken 4p). Instead, stratum 2 maps to distinct autosomal origins, which was likely added onto the original pair before the prototherian-therian divergence [90, 105]. JaridlC is a gene that shows inconsistent expression in mouse depending on the time of development, being inactivated initially during early embryogenesis and escaping as development proceeds. Along with the tissue and timing variability seen with JaridlC, some genes in this region also show variability in escape between human females (eg: Timpl); genes such as these tend to lack 5' C p G islands, as suggested by a recent analysis in humans [99]. One justification for this variability is that even though the stratum 2 X C R existed on the X prior to developing dosage compensation, it originated from a different genomic context than the traditional X C R (stratum 1). It could lack cz's-elements or epigenetic factors necessary for stable inactivation or contain sequence features more resistant to Xis t silencing than the traditional X C R . Eventually, this could have led to different eutherian lineages acquiring different expression profiles and evolving different mechanisms to maintain their expression or inactivation. The status of Ubel in coast mole and Pctkl (another gene within the region) might clarify whether each eutherian bears distinct expression status. A t loci whose expression concurs in both human and mouse, the status seems to also be similar in other species. Utx ( X A R ) , Fmrl and Ar ( X C R ) whose status coincides with that expected from evolutionary age - X C R genes that are subject to inactivation or X A R genes that escape silencing in human and mouse- show the same status in cow and mole. One explanation for the consistent expression within Eutheria is that the dosage o f these genes is crucial and therefore maintained throughout evolutionary time. For these genes, it is l ikely that all 89 eutherians share ancestral regulation mechanisms due to strong evolutionary pressures to maintain the correct dosage of the gene products. Presence of YHomologs One question is whether the silencing status of genes in different species reflects the differential Y homolog retention in various lineages. Understanding functions of the loci tested and their Y homologs might help to explain why genes require differential escape or inactivation. Zfx Zfx encodes a zinc finger protein with transcriptional activation, containing the Cys(2)His(2) (C2H2) zinc finger [134]. In humans, both this gene and its Y homolog are ubiquitously expressed in all tissues tested and at all stages of organismal development (GeneCards, http://bioinfo.weizmann.ac.il/cards/index.shtml). On the other hand, the mouse Zfy has functions confined to germ cells ( O M I M , http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM). Although a variety of carnivores (cat, dog, cheetah, jaguar, leopard, and lynx) and artiodactyls (pig, cow, sheep, goat, bison and horse) possess the Zfy gene (NCBI) , it is unknown whether Zfy is present in the mole and whether it is ubiquitously expressed in the cow. If the expression partem of the cow Zfy gene is similar to human, the need for dosage compensation at this locus would explain the escape status in human and cow, compared to inactive status in mouse, whose expression of Zfy has become testis-specific! Unfortunately, there is not enough information on Zfy expression or the mole Y chromosome to draw any conclusions on whether the locus complies with the Y homolog "rules" for inactivation, although the situation in human and mouse suggests this is true. Crsp2 A s its name implies, Crsp2 (cofactor required for Spl activation) encodes a nuclear cofactor comprising a large multiprotein complex Crsp, required for S p l to initiate transcription 90 [135]. In human and mouse, the Crsp2 gene is ubiquitously expressed, but mouse contains no Crsp2 Y homolog, whereas human contains a Y pseudogene ( O M I M and [99]). N o Crsp2 Y homolog was detected from the available cow and mole sequences deposited on the N C B I database, or radiation hybrid, linkage, or physical maps, to date. Although the requirement for dosage equivalency between males and females explains Crsp2 inactivation in mice lacking the Y homolog, this same explanation is insufficient for the cow and mole which also lack Y homologs. In addition, since the Y pseudogene in human by definition does not encode functional transcripts, the absence o f final dosage differences between males and females contradicts the observed escape status of Crsp2 in human. Thus, the presence of Y homologs does not account for the Crsp2 expression status on the inactive X chromosome across eutherians. Ubelx Ubelx encodes for a ubiquitin activating enzyme that catalyzes the first step of ubiquitin conjugation necessary for diverse cellular processes, including selective protein degradation, D N A repair, nuclear D N A replication/synthesis, and progression of the cell cycle ( O M I M ) . Consistent with its important cellular functions, Ubelx is widely expressed in mouse [136]. Ubely is found in many eutherians, a metatherian (opossum), and even a prototherian (platypus), but nevertheless is absent in humans [136]. N o Ubely homolog was found in the cow or any insectivores based on sequence data on N C B I map viewer or through B L A S T . Consistent with its proposed role in spermatogenesis, Ubely is limited to the testis in mouse [136]. Thus, it is understandable why X-l inked Ubelx is subject to X inactivation in mouse because this gene is essentially expressed in one dose in somatic tissues of both males and females. However, this X - Y gene pair is differentially regulated, being expressed at different ages and tissues, signifying that Ubelx and Ubely are not functionally equivalent [137]. In human and other eutherians (cow and mole) that do not appear to possess Ubely, Ubelx escapes inactivation, suggesting that the exact amount of final products from the Ubel locus is not crucial. The absence of Y homologs does not explain the expression status o f Ubelx in human, cow and mole, although the expression pattern of the Y homolog explains the status in mouse. 91 Utx The Utx (X-linked ubiquitously transcribed transcribed tetratricopeptide repeat [TPRJ) gene, which is expressed from the inactive X chromosome in both mice and humans, has a Y homolog, Uty, that encodes a protein with 8 T P R motifs, believed to mediate protein-protein interaction. The U T Y protein confers H - Y antigenicity on male cells known to be involved in stem cell graft rejection between males and females. Because the human UTY gene maps to band 5C known to contain spermatogenesis genes, Uty may similarly be involved in spermatogenesis (OMIIvl). Both Utx and Uty are widely expressed in all tissues of human and mouse (OMUvI and GeneCards). Uty is found in cow (GI: 17933095), human, mouse, and shrews (GI: 60267961, 60267959), whereas no evidence of this gene was found in dog, rat, and mole based on available sequence data on N C B I . Since Utx escapes inactivation in all four eutherians tested here, o f which three out of four are known to possess the Y homolog, it is likely that the coast mole also contains a homolog on the Y chromosome because it shows a similar Utx escape pattern (the online databases likely lack information on the mole). Based on these results, I conclude that the dosage of the Utx/Uty protein products is important and predicts whether Utx would be expressed or subject to inactivation. Eutherians lacking the Uty homologs would be expected to display an inactive status on the X chromosome. JaridlC Jumonji AT rich interactive domain 1C (JARID1C; formerly SMCX) is an evolutionarily conserved protein involved in transcriptional regulation and chromatin remodeling ( O M I M and [138]). J A R I D 1 C transcripts are detected in many human adult tissues with the highest expression in brain and skeletal muscle and the lowest expression in the heart and liver [138]. On the other hand, the Y-l inked JARID1D, or Jumonji AT rich interactive domain ID (formerly SMCY), is expressed only in pancreas, lung, brain, and skeletal muscle, at much lower levels than JARID1C [138]. The JaridlD homolog is found in cow, cat, mouse, human but not in rat, dog, or insectivores, based on available information ( N C B I , physical, linkage and R H maps). Therefore, relying solely on Y homolog differences, I would expect the coast mole JaridlC to be subject to inactivation. Instead, the mole gene showed a similar pattern to human JARID1C 92 escape, and the cow gene showed a similar inactivation status to that of mouse Jaridlc (early mouse development). In addition, Jar id lC and Jar id lD may not be functionally equivalent. In mice, JaridlC is expressed at significantly higher levels in the adult female brain relative to the male brain, yet the expression of JaridlD in males is insufficient to compensate for that difference [137]. Therefore, the presence of the low-expressing Y homolog in males likely does not necessitate the need for a second dose in females, although doing so might account for sex-specific differences. Because the expression of mouse JaridlC on the inactive X increases with developmental time [108, 109, 114], establishing such sex differences would occur during late development. In the present study, cow JaridlC was observed to be both subject to inactivation in a C C L 2 0 9 female cell line and to escape inactivation in an in vitro fertilization (IVF) female derived sample (Figure 3.7). The cow gene might be subject to inactivation due to low expression of the Y homolog like in mouse, but there is currently no information on expression levels of the cow JaridlD. Unfortunately, the developmental stage from which the I V F sample was derived is unknown, although the C C L 2 0 9 cell line is derived from the pulmonary artery o f a young female cow. Experiments using 85- and 95- day old fetuses from pregnant uteri, as well as bovine cells from skin biopsies with sex aneuploidy [139] have revealed that JaridlC is subject to inactivation during later developmental stages and adulthood, a result replicated in this study using C C L 2 0 9 cell lines. If the C C L 2 0 9 D N A is from a cow of a later developmental stage than the I V F cow, then the status observed in cow is opposite to the escape status preceded by silencing seen in mouse. Taken together, this study presumes that sex differences resulting from differential expression are achieved during an early developmental stage in cow. However, the variability seen between cow samples could reflect differences between individuals or cell types rather than developmental time. The cow variation idea can be assessed by comparing the methylation status of JaridlC in embryos o f known developmental stages and testing cell lines from different tissues and individuals. Ar and Fmrl The C p G islands of Ar and Fmrl were methylated in the four species tested in the present study. AR encodes for androgen receptor, a ligand-activated nuclear transcription factor, which is also a serine/threonine protein kinase. The active androgen-receptor complex regulates 93 the activity of androgen-responsive genes, which direct the development of sexual characteristics, such as hair growth and sex drive, in both genders, as well as male-specific sexual characteristics. In human, AR is expressed in many of the body's tissues, whereas in mouse, it is predominantly expressed in urogenital tissues. Mouse, human, cow and mole lack Y homologs for this gene. Since AR/Ar is subject to inactivation in all four species, this reflects the importance of precise dosage for regulating sexual characteristics, despite different tissue-specificities. Fmrl, which is infamous for its role in fragile X mental retardation when functionally lost, is implicated in dendritic development. A s with Ar, Fmrl bears no Y homologs in the four species tested, characteristic of genes located in the evolutionary old part of the X chromosome. The X-l inked copy of the gene was subject to inactivation in all four eutherians examined in this study, as would be expected of a gene that holds critical functions and is expressed in only one dose in males. 3.4.3) Summary of Factors in Escape In conclusion, the presence of Y homologs correlated with escape status of the X- l inked loci Utx, Ar, and Fmrl (perhaps Zfx), but not Crsp2, Ubelx or JaridlC. The first three loci (Utx, Ar, and Fmrl) showed concordant status in all four eutherians and the absence of Y homologs correlated with the evolutionary age of the region. The dosage o f the final products from these genes is presumably important and thus regulation of appropriate expression in males and females has been maintained in different eutherian lineages. For the three genes whose retention of Y homologs did not explain the need for escape, two out of three are located within the stratum 2 X C R , while all three showed discordant expression status between human and mouse. These genes may possess epigenetic differences across species that could account for the expression differences. The results of the present study support the idea of gene-specific regulation, since the hypothesized reasons for escape at one locus did not always apply to another. It is likely that the expression profiles of X- l inked genes across species result from multiple factors that contribute differently depending on the gene. Short generation times make it more likely that Y homologs are lost in evolutionarily new regions, and the lack of constitutive heterochromatin facilitates the spreading process. Large distance from the Xie make genes less prone to inactivation, but the initial predisposition of different genes by epigenetic marks vary. Species with acrocentric chromosomes could have long generation times 94 that counter the need for inactivation because Y homologs are still retained in the evolutionary newer regions. Hence, meaningful comparisons are best achieved using mammals with as many similar features as possible other than the factor tested. Many additional genes and species are necessary to the rigorously test the different factors associated with the regulation of escape versus inactivation. Future epigenetic analyses would also clarify whether genes that escape inactivation in different species share the same epigenetic modifications. 95 Chapter IV General Conclusion Whether it is Xist sequence or regulation of Xist by Tsix, rodents appear to be outliers in the eutherian infraclass. Results from the present conservation and methylation analyses further suggest that the secondary structure of the Xis t transcript and extent of silencing along the inactive X chromosome are unique in rodents compared to other eutherians. Although the use of the mouse model to study X inactivation has provided tremendous insight into the process of dosage compensation over the past decades, overall, the present comparative studies have underlined the importance of species-specific differences and cautioned against relying solely on mice to make generalizations about all eutherians. For the first time, secondary structure conservation of a transcript the size of Xis t has been studied using data from greater than three eutherian species and a R N A program designed to predict common structures in diverged, but related sequences. Results from this analysis have revealed conserved structures for the sequence before the A repeat, the A repeat, F repeat, as wel l as exon 4. The confirmation that the A repeat and exon 4 stems are conserved within the Xist sequence in multiple eutherians argues that these regions carry an important biological role common to all eutherians in the X-inactivation process. Considering its role o f silencing in mouse, the A repeat potentially recruits players involved in the establishment of silencing in all eutherians. However, the conserved A repeat stem predicted in the present study is different from the structure previously predicted by Mfold for single Xist orthologs [64]. This emphasizes that not all hairpins predicted to be stable are conserved and raises the possibility that the conserved stems formed from adjacent A repeats is responsible for silencing rather than the two stem loops formed within each A monomer. Past studies which demonstrated that mutations of A repeats disrupted silencing involved changing sites crucial for base-pairing of stems internal to each repeat [64]. According to the location of the conserved stems found in this study, the sites mutated are the same sites necessary to form stems between A repeats. Hence, the past mutational studies do not allow us to clearly distinguish the roles of the formerly predicted stems and the current stems found to be conserved here. On the other hand, exon 4 forms a stable stem loop in all o f the eutherians compared in the present study, identical to that predicted previously. Since exon 4 deletion in mouse led to no apparent changes in localization and random inactivation [69], it accordingly bears no impact on the same processes in the other 96 eutherians. Instead, the exon 4 hairpin could contribute to the stability of the Xis t transcript [69]. In this study, two new Xist regions were found to share common secondary structures across seven eutherians. These include the sequence before the A repeat and the F repeat. Because the A repeat is in close proximity to the 5' end, intuitively, transcriptional silencing machinery also binds to the sequence before the A repeat. Alternatively, conservation of this sequence could be a result of genetic hitchhiking. On the other hand, the F repeat conserved structure could mediate D N A binding, since the F repeat in seven eutherians has a lower tendency to form secondary structures than shuffled sequences bearing the same base composition. This region could affect mobility of Xis t by binding to the inactive X and/or interacting with S A F - A [68]. The present study uncovered Xis t secondary structures that are distinct to rodents. Conserved R N A folding of the unique exon 5 not shared in non-rodents could confer an advantage to rodents in the stability of the transcript, as it is located in close proximity to exon 4. Futhermore, differences in the structures between the A and F repeats, suggest that the process of silencing or mobility is altered between rodents and non-rodents. The R N A folding within the last kilobase of the D repeat was conserved only among the rodents investigated. Since deletions of the D repeat did not affect silencing, in mouse [64], this region must act downstream of silencing; in mice, this repeat takes part in both macroH2A recruitment and localization. Structural differences in the D repeat across species suggest that these processes are altered between rodents and non-rodents. This difference in D repeat structure could explain why human XIST only partially coats and silences the mouse autosome from which it is expressed [76]. Differences in recruitment of macroH2A and localization as well as downstream effects such as maintenance have not been investigated in cow, dog, rat, vole, or mole. Such studies on macroH2A recruitment and localization would be useful to test the idea of species variation due to secondary structure differences. M y work suggests that rodents have undergone more rapid Xist sequence and R N A structure divergence compared to other eutherian lineages. Given the differences in the rodents' Xis t secondary structures compared to other non-rodent eutherians, it is not surprising that rodents show differences in their extent of silencing along the X chromosome, i f the shape o f the Xis t transcripts bears any impact on affinity ofthe R N A to the X chromosome and/or subsequent silencing ability. This difference could partly account for the variation seen in the 97 extent of silencing on the X chromosomes between eutherian species, as observed in the methylation analysis presented here. From the current methylation analysis to test expression status along the X chromosome across different mammals, the mouse showed methylated status for three out of four X- l inked genes that were consistently unmethylated in other mammals of distinct eutherian lineages. This study supports the idea that mice are distinct in their complete silencing spread on the inactive X due to accelerated evolution of the chromosome from short generation times. On the other hand, I found no evidence that the distance from the XIC and the need for the Xis t transcript to cross through constitutive heterochromatin directly plays a role in affecting the expression o f X -linked genes in the mammals tested. Furthermore, for many loci seen here and also notable from a recent extensive analysis by Carrel and Willard (2005), numerous X- l inked genes escape inactivation despite the absence of their corresponding Y homologs [99]. The presence or absence of Y homologs adequately explains the methylation status of those loci that show similar expression in human and mouse. Gene dosage from these loci is presumably important and the retention of Y homologs coincides with the evolutionary age of the region. Since methylation status of these genes is consistent across eutherians lineages, conserved regulatory mechanisms likely exist at these loci to control for appropriate expression during X inactivation. The discovery of variable JaridlC expression between cow samples suggests that variability is not uncommon within the eutherian infraclass. A s monotremes and marsupials normally show variable expression of X-l inked genes, the eutherian variability could simply reflect common ancestral origins. The incomplete inactivation observed the human, cow, and mole X chromosome also resembles the incomplete inactivation seen along marsupial X chromosome, although mouse instead shows nearly complete inactivation. This study strengthens the argument that the mouse X chromosome is at a state of acquiring more complete inactivation compared to other eutherians, possibly due to their short generation times or any mechanism leading to an increased mutation rate in the rodent, and/or the establishment of epigenetic factors necessary for efficient inactivation. Humans, however, might be at the other extreme of allowing a large number of genes to escape silencing on the long arm of the X [99], regardless of whether these same genes are located on the long arm or short arm in the other eutherians. Investigation of the expression status at these loci in other mammals w i l l test whether the human expression profile is unique. The retention of Y homologs, as for RPS4X, might explain the need to escape for some human 98 X C R genes [100]. Long generation times in human could account partially for the escape observed along the human X chromosome. Long arm X C R genes, such as IKBKG, that escape inactivation in human should be investigated in other mammals to test this idea. Although the current methylation analysis has conclusively shown that mice have unique inactivation for several loci , the mechanisms by which genes escape inactivation in one species versus another remain unclear. X;autosome translocations studies suggest that escape correlates with increased distance from the XIC and differs depending on genomic context: which chromosome, and which segment of a given chromosome are involved [43, 59, 122]. Inducible Xist transgenes inserted into human autosomes have confirmed that localization differs depending on insertion into the p versus q arm of a given chromosome (personal communication: Sharan Sidhu and Jennifer Chow). The spread of silencing to these regions might also differ, depending on composition of possible way stations, such as L I or other repetitive elements [42, 106]. Similarly, in this study, because there has been the shuffling o f genes (the extent of which depends on the eutherian) on the X despite conserved synteny from the bulk of the chromosome, loci have been repositioned to different genomic contexts, which could affect silencing. What is necessary to delineate the factors correlated with escape are high resolution X chromosome maps from numerous eutherians clearly showing gene order, an extensive analysis of expression status of numerous X-l inked genes in multiple eutherians, and genomic context information about their X chromosomes, especially of L I and L T R s which might play a role in silencing [42, 106]. These data, of course, are not readily available, making correlations with escape from X inactivation difficult. Although the genomes of many eutherians are currently being sequenced, the gene order and expression status along the X w i l l require concerted effort by many researchers. The methylation analysis used in this study is a limited means to assay for expression along the inactive X chromosome. The genes investigated are restricted to those containing C p G islands near their promoters. Genes lacking C p G islands tend to be variable in expression [99] and this assay makes studying the regulation o f these genes infeasible. The number o f C p G islands whose methylation status can be assessed is limited by the number of Hpall restriction sites within the P C R product. Although primers are designed to flank numerous Hpall restriction sites, a single unmethylated C p G could lead to unsuccessful P C R amplification, suggesting hypomethylation of the C p G island. It is unclear whether the methylation status of a single C p G reflects the global methylation status o f the entire island studied, although this could 99 be tested using sodium-bisulfite sequencing. Furthermore, methylation analysis does not allow one to investigate the role of other epigenetic differences in regulating, maintaining, or predisposing X-l inked genes to a particular expression status. Differences such as H 3 K 4 or K 9 methylation, H 3 K 9 acetylation, and H 4 acetylation could cause variations in expression. The difference in L I and C p G island density could also mark monoallelically expressed from biallelically expressed genes, as in the case for autosomal, random monoallelically expressed loci [44] . Future experiments using ChlP (chromatin immunoprecipation) and correlations made by genomic sequence analyses could address these additional candidate factors in controlling expression patterns on the X chromosome. 100 Chapter V Material and Methods 5.1) Polymerase Chain Reaction (PCR) 25 u l P C R s were set up in Biometra or Techne Genius Thermal Cyclers with 1 min denaturation at 94°C, 1 minute annealing at 50-62°C, and 2-3 min elongation at 72°C, depending on the primers (please see the primer table for corresponding properties and conditions). Each reaction contained 200ng template, 2 0 u M dNTPs, 1.5mM M g C l 2 , l u l 10X P C R buffer (200 m M Tris HC1 [pH 8.4], 500 m M KC1), l u M primer pair, and 0.625 units (U) Taq D N A polymerase (all from G i b c o / B R L except primers which were ordered from U B C N A P s sequencing centre). Occasionally, the amplification of C G rich regions, for example the C p G islands in the methylation analysis, required 1 or 2 M Betaine which decreased secondary structure formation to allow for more efficient amplification, or a change in M g C l 2 concentration. Amplifying long P C R fragments such as those spanning large coast mole Xist regions greater than 3 kb necessitated the use o f the Expand Long Template system (Boehringer Mannheim). Two separate 25pl mixes, kept on ice, were combined into a total 50ul volume for amplification. The first solution consisted of primers, dNTPs, D N A in water, in the same final concentrations as for standard P C R above. The second mix contained 2 U of enzyme mix, Buffer 3 (400 m M Tris HC1 (pH 8.4, 1 M KC1, 0.75mM M g C l 2 ) and water. Once combined, the reaction was overlayed with 30pl mineral oi l before putting into thermocyclers for an initial 2 min denaturation at 94°C, followed by 40 cycles of (30 sec at 94°C for denaturing, 30 sec at 50-54°C for annealing depending on the primers, and 3-7 minutes at 68°C for elongation depending on fragment size). The 40 cycles was followed by a 68°C extension for 7 minutes to ensure that partial fragments are completely synthesized. 5.2) Cloning The p G E M - T vector system (Promega) was used to TA-clone P C R products for increased concentration of low-yield products for sequencing and ensured that desired products were sequenced. It was required in cases where sequencing reactions failed, possibly due low 101 concentration of templates, suboptimal conditions for annealing o f user-supplied primers, or remnant unwanted products in gel purified samples. Propagating the P C R fragment via bacterial clones increased the template concentration, ensured that a single product was sequenced, and eliminated the problem of user-supplied primers, since standard vector primers were used in the sequencing reactions. A s described in the p G E M - T vector system manual, 10ul ligations were carried out using the Rapid Ligation Buffer (60mM T r i s - H C L [pH 7.8], 2 0 m M M g C l 2 , 2 0 m M D T T , 2 m M A T P , 10% polyethylene glycol M V 8 0 0 0 , A C S Grade]), 3 U of T4 ligase, 50ng p G E M - T vector, and the appropriate nanogram amount of P C R product depending on its size to give a 1:1 vector: insert ratio. The ligation utilized the A overhangs of the product preferentially left by Taq polymerase (or any other D N A polymerase without 3'->5' exonuclease activity) in P C R reactions and the T overhang supplied by the vector designed to reduce self-ligations ofthe vector or insert. The reaction was mixed by pipetting and remained at 4°C overnight to allow for efficient ligation. Transformation was carried out using 50pl JM109 High Efficiency Competent Cells (Fisher) that were transferred to 2ul of each ligation reaction after thawing from -70°C. The tubes were mixed by gently flicking and were then placed on ice for 20 min, after which the cells were heat-shocked for 45-40s at 42°C, and immediately returned to ice for another 2 min. To propagate the transformants, 950ul of S O C medium was added and the bacteria were incubated for 1.5 hours at 37°C with 150 rpm shaking. 10 and lOOpl of each transformation culture was spread onto L B , ampicillin (50pg/ml), IPTG, and X-ga l agar plates that were incubated overnight at 37°C. Plasmid minipreps were prepared from white colonies for insert analysis. The presence and size of inserts were confirmed via restriction enzyme analysis by Notl (8-cutter) and Ncol (6-cutter) double digests. 5.3) Restriction Digest (Pst\) Cloning Several coast mole P C R fragments were in low concentration due to poor amplification, even after several coast mole specific primers and many P C R conditions were attempted. Before cloning the P C R product into vectors to increase the yield, the P C R sample was treated with a proteinase (proteinase-K, Invitrogen) followed by phenol-chloroform extraction to remove proteins and salts. The clean P C R product was digested with Pstl (6-bp cutter) to give 102 different size bands and cloned into the pBSII bluescript vector (Stratagene), which was cut with the same enzyme and treated with shrimp alkaline phosphatase (SAP) to prevent self-annealing of the vector. A 1:1 insertvector ratio was used in an overnight ligation at 4°C, followed by transformation into DH5 alpha competent bacteria, which were grown on L B plates with I P T G and X-ga l , and picked the next morning using blue-white screening. White clones were propagated in L B media at 37°C in a shaker for 16 hours. Inserts were analyzed using plasmid minipreps followed by restriction enzyme analysis with Pstl, to select positive clones for sequencing. 5.4) 5' and 3 ' R A C E Since P C R amplification of the 5' and 3' ends of coast mole Xist proved problematic using conserved primers designed from the multiple alignment o f cow, human, and mouse, Rapid Amplification of c D N A Ends ( R A C E ) (GeneRacer K i t , Invitrogen) was used to amplify the promoter and U T R regions of coast mole Xist, as well as to reach a po ly-A signal. Alignments of sequenced coast mole Xist regions indicated that the gene ended at ~2 kb and 2.5 kb away from the 5' and 3' terminals of sequenced D N A , respectively. However according to the properties of the mouse, vole, and human Xist, the largest exons of the 3' end (-4.5 kb human e6, mouse e7) contains many alternative splice donor sites which lead to earlier termination of the exon. The human exon 6 contains splice donor site 3 kb into the exon. Since I sequenced 2.5 kb of coast mole exon 6, 3' R A C E was performed to obtain the last portions of the mole Xist gene. 5 ' R A C E was also attempted, since this system supposedly works to obtain 5' and 3' ends as far away as 2 kb from available sequences. For 5' R A C E , the GeneRacer method involved the treatment of total coast mole R N A with calf intestinal phosphatase (CIP) to remove 5' phosphates from non-capped R N A or non-R N A s . The treatment of dephosphorylated R N A with tobacco acid pyrophosphatase (TAP) removed the 5'cap structure from intact, full length R N A and exposed a 5' phosphate, preparing it for subsequent ligation to the GeneRacer oligo, whereas truncated m R N A and non-mRNA whose 5' phosphates were removed in the initial CIP step could not undergo this ligation. The ligation of the supplied oligo by T4 ligase provided a unique priming site for GeneRacer primers to generate full-length c D N A from m R N A . 3 ' R A C E did not require any dephosphorylation or decapping steps. The R N A was reversed transcribed using a user-designed gene-specific primer 103 near the 5' end or, in the case of 3' R A C E , the GeneRacer OligodT primer. Next a P C R reaction was performed with a supplied GeneRacer 5' or 3 ' primer and a complementary gene specific primer with the following conditions: Temperature (°C) Time (minutes) Cycles 94 2 1 94 0.5 5 72 2 5 94 0.5 5 70 2 5 94 0.5 20 65 0.5 20 68 2 20 68 10 1 For both 5' and 3' R A C E , the supplied total HeLa R N A was used as an internal control for proper conditions, performed alongside the coast mole reactions. It is uncertain whether Xis t R N A is capped; therefore, total human R N A was used as a control in 5 ' R A C E , with human Xist specific primers in the R T and P C R steps. 5.5) Gel Extraction and DNA Purification P C R products were purified by first excising the P C R band from a low-melt gel and then cleaning up the D N A via the Qiagen QIAquick P C R purification kit ( Q I A G E N Cat. No.287704). The agarose fragment was solubilized using buffer Q G and then placed in a supplied silica-gel membrane spin column, to which D N A adheres when centrifuged, whereas remaining agarose was filtered through into the collecting tube and was discarded. Additional buffer Q G was combined in the column and spun to remove trace amounts of agarose. The sample was washed several times in an ethanol-based buffer P E to remove any salts and the final D N A was eluted with 30ul water into a clean 1.5ml Eppendorf tube. 104 5.6> NCBI. BCM Search Launcher Searches for sequences similar to the Xist gene from different species required the Nucleotide-nucleotide B L A S T program (blastn) available on N C B I (http://wvvvv.ncbi.nlm.nih.gov/). Pairwise alignments of Xist orthologs were performed with the bl2seq B L A S T tool in order to note relative locations to human or mouse XIST/Xist and to approximate exonic boundaries. The map viewer available on N C B I was used to compare the X chromosomes from dog, rat, cow, mouse and human, as well as to confirm locations of X-l inked genes or Y homologs. Multiple sequence alignments were done via ClustalW v. 1.6 set, available on the B C M Search Launcher website (http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) to design degenerate primers for sequencing o f mole Xist. 5.7) Nucleic Acid Dotplots The dot plot program available online at http://arbl.cvmbs.colostate.edu/molkit/dnadot/ was used to align two sequences with poor local similarities, which was infeasible using B L A S T . This program allowed the detection of repetitive regions and global similarities within the sequences. A sliding window size of 19 and a mismatch limit of 4 were generally defined, such that a dot was placed in the middle of the window by the program when at least 15 out of 19 bases matched between the two sequences. 5.8) Tandem Repeat Finder Xist sequences from each of the eutherians were submitted to the Tandem Repeat Finder (http://tandem.bu.edu/trf/trf.advanced.submit.html). To adjust all parameters for low stringency detection of tandem repeats, the advanced input form was used. The alignment parameters were set at (2, 7, 7) for the number of matches, mismatches and insertions/deletions, respectively. The minimum alignment score to retrieve results was 50 and the maximum period size was restricted to 500. The output regarding repeats included their locations, consensus sequences, and copy numbers. 105 5.9) CARNAC C A R N A C was employed to predict common secondary structures between Xist orthologs (public webserver, http://bioinfo.lifl.fr/carnac/). The sequences were entered in F A S T A format; and the parameters were set to eliminate redundant sequences and to account for G C content within the sequences without allowing isolated stems. The stems (found in all orthologs) for each sequence retrieved were displayed in three formats: a Connect file (ct), which provided a textual description of the base pairings, a PostScript (ps) file and a J P E G (jpg) file, which both provided graphical representations of the secondary structures. The J P E G and PostScript files were automatically generated from the ct file using the freely distributed drawing tool Naview. When no common stems were detected then the message ' N o structure found' was displayed. To visualize all predicted foldings at once for easy comparison, RNAfami ly , a Java applet dedicated to presenting multiple R N A sequences, was downloaded from the C A R N A C site. 5.10) RNAlifold ClustalW alignments were first obtained for the orthologous sequences in question (http://www.ebi.ac.uk/clustalw/). The alignments saved in text file were then uploaded onto the R N A l i f o l d server (http://rna.tbi.univie.ac.at/cgi-bin/alifold.cgi) for analysis using the default parameters. Alignment lengths were limited to 1.5 kb. In particular, the fold algorithm of "partition function and pair probabilities" was selected; weight o f covariance term was 1; penalty for non-compatible sequences was 1; and energy parameters were scaled to 37° C for temperature. The output consisted of mountain plots, energy dotplots, and predicted consensus structures o f the orthologs, given in postscript formats. 5.11) Mfold Individual orthologous sequences from each species was entered into the Mfo ld webserver version 3 (http://mfold.burnet.edu.au/) or version 3.1 (http://www.bioinfo.rpi.edu/applications/mfold/old/dna/forml.cgi). Default parameters were used: folding temperature was 37°C, percent suboptimality was 5, upper bound on the number 106 of computed foldings was 50, ionic conditions were 1 M N a C l without divalent ions, and default window parameter was used. N o pairing constraints were applied. 5.12) R Statistics Package Original as well randomized Xist datasets were input to the R Statistics package (version for Windows), downloaded from http://cran.r-project.org. The data sets were read by the program as tab-delimited text files, displaying information in 4 columns ~ species, region of Xist, run number, stem length. Each row signified a single stem detected by C A R N A C as being conserved between species. Histograms of frequency versus stem lengths were generated for both randomized and original data sets using command lines. Values of the range, median, mean, and standard deviation (SD) were also obtained from the plotted histograms using the R software. 5.13) Tissue Culture Female and male coast mole fibroblast cell lines were established previously by Sanja Karalic from fresh organs of moles sacrificed in a study by Dr. Kev in Campbell [84]. The female cat ( C C L 176), cow ( C C L 209), and rabbit ( C C L 193), as well as male cow ( C C L 207) fibroblast cell lines, were purchased from the American Tissue Culture Catalog ( A T C C ) (http://www.atcc.org/catalog/cellBiology/cellBiologyIndex.cfm). These were immortal cell lines derived from the embryonic tongue in the cat, a main stem pulmonary artery of a young female cow, a lung biopsy from a rabbit, and the trachea o f a male cow. The mouse B M S L 2 cell line was derived from FI generation mice fibroblasts and contains both inactive and active X s with distinguishable alleles due to heterozygosity [140]. This cell line was split from a growing flask and propagated in a separate t25 (25cm flask). After thawing the cell lines from liquid nitrogen, the coast mole cells were maintained in 15% fetal calf serum (Cansera FCS) minimal essential media (alpha M E M Gibco /BRL) containing 1% non-essential amino acids ( N E A A ) , 1% penicillin/streptomycin (P/S), and 1% L -glutamine (all from Gibco /BRL) at 37°C. The cat, cow, and rabbit cells required only 10% F C S , again with N E A A , P/S, and L-glutamine. B M S L 2 female mouse cells were maintained in 7.5% F C S M E M without N E A A . 107 To split growing cells, 90-100% confluent t25 flasks of cell lines were rinsed in I X phosphate saline buffer (PBS), trypsinized with 0.25% trypsin-EDTA and then resuspended in 10ml of media, with the exception of the coast mole fibroblasts which required scraping instead of trysinizing to lift the adherent cells from the flask surface. 1ml ofthe coast mole suspensions, or 0.3ml of A T C C cell suspensions were then transferred to new flasks containing appropriate media to seed new colonies. To prepare D N A or R N A , the cell suspensions were centrifuged at room temperature. After removing the supernatant, the cell pellets were stored in -70°C until further use, or used immediately for R N A and D N A extraction. Freezing cell lines for long term storage required similar steps to making cell pellets, except after centriftiging and removing the supernatant, the pellet was resuspended in 15%FCS R P M I containing 10% dimethyl sulfoxide ( D M S O ) and then transferred to a cryovial (1ml o f solution per 50% confluent t25 flask). The cryovial remained in an isopropanol filled container to cool slowly at -70°C for at least 4 hours before being placed in long term liquid nitrogen. 5.14) RNA Extraction The acid-guanadinium-phenol-chloroform R N A extraction was performed as described initially by Chomczynski and Sacchi in 1987 [141]. Solution D containing guanadinium thiocyanate was added (0.6ml per confluent 60mm dish) to the cell pellet followed by vortexing to disrupt the cell membranes. Guanadinium and water complex with R N A to prevent hydrophilic interactions with D N A and proteins, such that R N A remains in the aqueous layer after phenol-chloroform extraction. Following Solution D , an equal volume of diethyl pyrocarbonate-treated water (DEPC is used to inactivate RNAses) saturated phenol (which acts as a deproteinating agent) and 0 .2M sodium acetate (pH 4) was added. The sample was vortexed for thorough mixing. 0.4 volumes chloroform was added and the preparation was placed on ice for 5-15 min to remove any traces o f phenol, followed by centrifugation (10 min, 13,500 rpm). The top layer containing R N A was placed in a new 1.5ml Eppendorf tube, while the lower layer containing D N A and proteins was discarded. One volume of isopropanol was added to the R N A solution and the sample was stored overnight at -20°C, where the sodium acetate salt helped to precipitate out the R N A in the alcohol. The next morning, the R N A was centrifuged for 10 min at 13,500 rpm to pellet the R N A which was briefly rinsed with 70% 108 ethanol to remove excess traces of salt, and then centrifuged to re-pellet the R N A . After removing the alcohol supernatant, the pellet was air-dried, re-suspended in 50pl DEPC-water and stored at -20 °C. 5.15) Reverse Transcription Before carrying out a reverse transcription (RT) reaction of the R N A sample to make c D N A , the R N A sample was DNase-treated to remove trace amounts o f D N A . For this, l / 20 t h volume of porcine RNase inhibitor (RNasin, Amersham Pharmacia Biotech) and 1/10 th volume of RNase-free DNase were combined with the R N A in DEPC-ddH20, incubated for one hour at 37°C. A phenol-chloroform extraction was performed to remove proteins, salts, and buffer from the DNA-free R N A . A n equal volume of 1:1 phenolxhloroform solution was added, followed by vortexing and centrifuging for 10 minutes at 12,500 rpm at 4°C. The upper aqueous layer was transferred to a new tube after which 1 volume of chloroform was added, the sample was vortexed, and centrifuged. To precipitate the R N A , 0 .2M sodium acetate and 1 volume of isopropanol was added, as above, and left overnight in -20°C. The R N A was spun for 10-15 min in a 4°C centrifuge in the morning, the supernatant was removed, and the pellet was resuspended in 50ul DEPC-water after air-drying. The actual R T reaction required the combination of 5ug of R N A , I X first-strand buffer (Gibco/BRL) , 0.01M Dithiothreitol (DTT) (Invitrogen), 0.0625mM dNTPs (Gibco/BRL) , l u l random hexamers, 2ul (1U) R N A s i n (Invitrogen), 1 (4.1 (1U) M - M L V (Moloney Murine Leukemia virus) Reverse transcriptase, brought up to a total volume o f 20ul by DEPC-water. The reaction mix sat for 5 min at room temperature followed by incubation for 2 hours at 42°C. The mix was incubated subsequently at 95°C for 5 minutes and the c D N A was stored at -20°C until further use. 5.16) D N A Extraction For cell pellets from a confluent t25 flask, 1 ml of T r i s - E D T A (TE) buffer (pH 7.5-8), l / 20 t h volume of 20% sodium dodecyl sulfite (SDS) and l u l of proteinase K were added. The sample was incubated at room temperature overnight. SDS detergent works by rupturing the cell membranes to expose nucleic acids, while the proteinase K digests a wide array of proteins. 109 The next morning, 0 .5M N a C l was added followed by incubation for 2 hours at 37°C until the sample was in solution. After adding 1.7M N a C l , the sample was shaken vigorously and centrifuged at room temperature for 15 min at 2,500 rpm. The supernatant was transferred to a new tube and 1/30 volume 20% SDS as well as 1.7M N a C l were added. The mixture was again shaken vigorously and spun for 15 min at 2,500 rpm. Finally, the supernatant was transferred to a new vial and 2 volumes o f ethanol were added to precipitate the D N A . The D N A was then recovered and suspended in T E , together stored at 55°C to for two hours to dissolve the D N A . Alternatively, when not much D N A was retrieved, the D N A in ethanol was stored at 4°C overnight for precipitation. The sample was centrifuged the following day to retrieve a pellet that was resuspended in T E after air-drying. 5.17) UCSC - Degenerate Primers for CpG Islands Primers for methylation analysis were devised from sequence information supplied by Santa Cruz genome browser (http://genome.ucsc.edu/) using the human genome M a y 2004 assembly. Regions with 40-60% G C content located within 2 kb o f the X- l inked genes were considered C p G islands. In viewing options, conservation plots were displaced in order to visualize the multiple sequence alignment made available for each island. Degenerate primers were designed from regions of high conservation within the X-l inked C p G islands of interest to give P C R product sizes of 100-700 bp. Often, the parts of the C p G islands that overlapped with exonic regions of the genes nearby were the most conserved and useful. 5.18) Methylation Analysis Methylation analysis was performed on cow, mole, human, and mouse genomic D N A to assess for expression status o f X-l inked genes. To first digest the D N A into smaller pieces, 4ug o f genomic D N A was combined with 1/10 volume appropriate restriction enzyme buffer, 10U EcoRl, and brought up to a total volume of 40pl. The solution was subjected to RNase treatment and phenol-chloroform extraction (described above). To evaluate the presence or absence of methylation marks at C p G islands 5' of Zfx, Crsp2, Utx, Ubel, JaridlC, Fmrl, and Ar, the restriction enzyme Hpall was used. This enzyme cut only unmethylated C C G G recognition sites, but did not cut methylated recognition sites. Subsequent P C R amplification o f 110 Hpall digested samples using primers to the C p G island of interest exhibited bands on an agarose gel only when the locus was methylated (reflective of inactive status). To make Hpall digests, 2ug of the EcoRl digests was added to a mixture of 1/10 volume Hpall restriction buffer, 10U Hpall enzyme (methylation sensitive, C C G G ) and brought up to 20ul with water, for a final genomic concentration of 100ng/ul From the same EcoRl digest, a similar digest was done with Mspl which served as a cutting control. Mspl recognizes the same site ( C C G G ) as Hpall but cuts whether the site is methylated or not. A third digestion was performed from initial EcoRl digests. This was a mock digest where no restriction enzyme (glycerol/buffer solution instead) was added but the sample contained the same buffer used for Hpall. A l l three types of digests were incubated at 37°C for optimal cutting. P C R s of the digests using degenerate primers designed for X-l inked C p G islands of interest were performed. Absence of bands generated from Hpall digests signified an active status at the locus (Figure 5.1). The mock digest served as an internal control for both the presence of D N A , tolerable incubation conditions, and successful P C R conditions. I l l Hpall Hpall C H 3 EcoRl EcoRl EcoRl EcoRl PCR GEL Figure 5.1. M e t h y l a t i o n Ana lys i s . Genomic D N A samples from male and female eutherians were digested with Hpall which cuts at unmethylated C p G recognition sites. P C R amplification o f the region o f interest reveals a band only i f the locus was methylated, indicative o f inactive status for most genes. References 1. Schartl, M . , Sex chromosome evolution in non-mammalian vertebrates. Curr Opin Genet Dev, 2004.14(6): p. 634-41. 2. Graves, J .A. , Mammals that break the rules: genetics of marsupials and monotremes. Annu Rev Genet, 1996. 30: p. 233-60. 3. Margulies, E . H . , Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. P N A S , 2005. 102(9): p. 3354-3351. 4. Murphy, W.J . , et al., Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science, 2001. 294(5550): p. 2348-51. 5. Springer, M . , et al., Mitochondrial versus nuclear gene sequences in deep-level mammalianphylogeny reconstruction. M o l B i o l Evol . , 2001.18(2): p. 132-4. 6. Delsuc, F., H . Brinkmann, and H . Philippe, Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet., 2005. 6(5): p. 361-75. 7. Thomas, J.W., et al., Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 2003. 424(6950): p. 788-93. 8. Zarkower, D. , Establishing sexual dimorphism: conservation amidst diversity? Nat Rev Genet, 2001.2(3): p. 175-85. 9. Hodgkin, J., Sex determination and dosage compensation in Caenorhabditis elegans. Ann. Rev. Genet, 1987. 21: p. 133-154. 10. Csankovszki, G . , P. McDonel , and B . J . Meyer, Recruitment and spreading of the C. elegans dosage compensation complex alongXchromosomes. Science, 2004. 303(5661): p. 1182-5. 11. Meller, V . H . , et al., Ordered assembly of roXRNAs into MSL complexes on the dosage-compensated X chromosome in Drosophila. Curr B i o l , 2000.10(3): p. 136-43. 12. Graves, J .A. , Sex and death in birds: A model of dosage compensation that predicts lethality of sex chromosome aneuploids. Cytogenet Genome Res., 2003.101(3-4): p. 278-282. 13. Kuroda, Y . , et al., Absence ofZ-chromosome inactivation for five genes in male chickens. Chromosome Res., 2001. 9(6): p. 457-68. 14. Kuroiwa, A . , et al., Biallelic expression ofZ-linkedgenes in male chickens. Cytogenet. Genome Res., 2002. 99: p. 310-314. 15. Bisoni, L . , et al., Female-specific hyperacetylation of histone H4 in the chicken Z chromosome. Chromosome Res., 2005.13(2): p. 205-14. 16. Richardson, B.J . , A . B . Czuppon, and G . B . Sharman, Inheritance of glucose-6-phosphate dehydrogenase variation in kangaroos. Nature N e w B i o l . , 1971. 230: p. 154-155. 17. Migeon, B .R. , S. Jan de Beur, and J. Axelman, Frequent derepression of G6PD and HPRTon the marsupial inactive X chromosome associated with cell proliferation in vitro. Exp. Cel l Res., 1989.182: p. 597-609. 18. Graves, J .A. and M . Westerman, Marsupial genetics and genomics. Trends Genet, 2002. 18(10): p. 517-21. 19. Brown, C. J., et a l , The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Ce l l , 1992. 71: p. 527-542. 113 20. Hendrich, B . D . , C.J . Brown, and H.F . Willard, Evolutionary conservation of possible functional domains of the human and murine XIST genes. Hum. M o l . Genet., 1993. 2(6): p. 663-672. 21. Okamoto, I., et al., Epigenetic dynamics of imprinted X inactivation during early mouse development. Science, 2004. 303(5658): p. 644-9. 22. Huynh, K . D . and J.T. Lee, Inheritance of a pre-inactivated paternal X chromosome in early mouse embryos. Nature, 2003. 426(6968): p. 857-62. 23. Hoyer-Fender, S., C. Costanzi, and J. Pehrson, Histone macroH2A1.2 is concentrated in the XY-body by the early pachytene stage of spermatogenesis. Exp. Cel l Res., 2000. 258: p. 254-260. 24. Richler, C , S.K. Dhara, and J. Wahrman, Histone macroH2A1.2 is concentrated in the XY compartment of mammalian male meiotic nuclei. Cytogenet. Ce l l Genet., 2000. 89: p. 118-120. 25. Cheng, M . K . and C M . Disteche, Silence of the fathers: early X inactivation. 2004. 26: p. 821-824. 26. Marahrens, Y . , et al., Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes & Dev., 1997. 11: p. 156-166. 27. McCarrey, J.R., et al., X-chromosome inactivation during spermatogenesis is regulated by an Xist/Tsix-independent mechanism in the mouse. Genesis, 2002. 34(4): p. 257-66. 28. Armstrong, S.J., et al., Different strategies of X-inactivation in germinal and somatic cells: histone H4 under acetylation does not mark the inactive X chromosome in mouse male germline. Exp. Cel l Res., 1997. 230: p. 399-402. 29. Fernandez-Capetillo, O., et al., H2AXis required for chromatin remodeling and inactivation of sex chromosomes in male mouse meiosis. Dev Ce l l , 2003. 4(4): p. 497-508. 30. Turner, J . M . , et al., Meiotic sex chromosome inactivation in male mice with targeted disruptions of Xist. J Ce l l Sci, 2002.115(Pt 21): p. 4097-105. 31. Rougeulle, C , et al., Differential histone H3 Lys-9 and Lys-27 methylation profiles on the X chromosome. M o l Cel l B i o l , 2004. 24(12): p. 5475-84. 32. Kohlmaier, A . , et al., A chromosomal memory triggered by Xist regulates histone methylation inXinactivation. PLoS B i o l , 2004. 2(7): p. E l 7 1 . 33. Latham, K . E . , Xchromosome imprinting and inactivation in preimplantation mammalian embryos. Trends Genet., 2005. 21(2): p. 120-7. 34. Fang, J. , et al., Ringlb-mediated H2A ubiquitination associates with inactive X chromosomes and is involved in Initiation of X-inactivation. J B i o l Chem, 2004. 35. Heard, E . , et al., Methylation of histone H3 at Lys-9 is an early mark on the X chromosome duringXinactivation. Ce l l , 2001.107: p. 727-738. 36. Plath, K . , et al., Role of histone H3 lysine 27 methylation in X inactivation. Science, 2003.300(5616): p. 131-5. 37. Plath, K . , et al., Developmentally regulated alterations in Polycomb repressive complex 1 proteins on the inactive Xchromosome. J Cel l B i o l , 2004.167(6): p. 1025-35. 38. Silva, J. , et al., Establishment of histone H3 methylation on the inactive X chromosome requires transient recruitment ofEed-Enxl polycomb group complexes. Dev Cel l , 2003. 4(4): p. 481-95. 39. Csankovszki, G . , A . Nagy, and R. Jaenisch, Synergism of Xist RNA, DNA methylation, and histone hypoacetylation in maintainingXchromosome inactivation. J. Ce l l B i o l . , 2001.153: p. 773-783. 114 40. Brown, C.J . and H.F . Willard, The human X inactivation center is not requiredfor maintenance of Xinactivation. Nature, 1994. 368: p. 154-156. 41. Wutz, A . and R. Jaenisch, A shift from reversible to irreversible X inactivation is triggered during ES cell differentiation. M o l . Ce l l , 2000. 5: p. 695-705. 42. Lyon, M . F . , The Lyon and the LINE hypothesis. Seminars in Cel l & Developmental Biology, 2003.14: p. 313-318. 43. Brown, C.J . and J . M . Greally, A stain upon the silence: genes escaping X inactivation. Trends Genet, 2003.19(8): p. 432-8. 44. Al len , E . , et al., High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes. Proc Natl Acad Sci U S A , 2003.100(17): p. 9940-5. 45. Bailey, J .A. , et al., Molecular evidence for a relationship between LINE-1 elements and Xchromosome inactivation: the Lyon repeat hypothesis. Proc. Natl . Acad. Sci . , U S A , 2000. 97: p. 6634-6639. 46. Waters, P .D. , et al., LINE-1 distribution in Afrotheria and Xenarthra: implications for understanding the evolution of LINE-1 in eutherian genomes. Chromosoma, 2004.113: p. 137-144. 47. Hansen, R.S. , X inactivation-specific methylation of LINE-1 elements by DNMT3B: implications for the Lyon repeat hypothesis. Hum M o l Genet, 2003.12(19): p. 2559-67. 48. Plath, K . , et al., Xist RNA and the mechanism of X chromosome inactivation. Annu Rev Genet, 2002. 36: p. 233-78. 49. Clerc, P. and P. Avner, Role of the region 3' to Xist exon 6 in the counting process ofX-chromosome inactivation. Nat. Genet., 1998.19: p. 249-253. 50. Morey, C , et al., The region 3' to Xist mediates Xchromosome counting and H3 Lys-4 dimethylation within the Xist gene. Embo J, 2004. 23(3): p. 594-604. 51. Cattanach, B . and C. Rasberry, Identification of the Mus spretus Xce allele. Mouse Genome, 1991. 89: p. 565. 52. Simmler, M . C . , et al., Mapping the murine Xce locus with (CA)n repeats. Mammalian Genome, 1993. 4: p. 523-530. 53. Shibata, S. and J.T. Lee, Tsix transcription versus RNA-based mechanisms in Xist repression and epigenetic choice. Curr B i o l , 2004.14(19): p. 1747-54. 54. Lee, J.T. and N . L u , Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cel l , 1999. 99: p. 47-57. 55. Lee, J.T., Disruption of imprinted X inactivation by parent-of-origin effects at Tsix. Ce l l , 2000.103: p. 17-27. 56. Sado, T., et al., Regulation of imprinted X-chromosome inactivation in mice by Tsix. Development, 2001.128: p. 1275-1286. 57. Ogawa, Y . and J.T. Lee, Xite, X-inactivation intergenic transcription elements that regulate the probability of choice. M o l . Ce l l , 2003.11: p. 731-743. 58. White, W . M . , et al., The spreading ofX inactivation into autosomal material of an X;autosome translocation: evidence for a difference between autosomal andX-chromosomal DNA. A m . J. Hum. Genet., 1998. 63: p. 20-28. 59. Sharp, A . J . , et al., Molecular and cytogenetic analysis of the spreading ofX inactivation in X;autosome translocations. Hum. M o l . Genet., 2002.11: p. 3145-3156. 60. Chureau, C , et al., Comparative sequence analysis of the X-inactivation center region in mouse, human and bovine. Genome Res., 2002.12: p. 894-908. 61. Nesterova, T .B . , et al., Comparative mapping ofX chromosomes in vole species of the genus Microtus. Chromosome Research, 1998: p. 41-48. 115 62. Brockdorff, N . , et al., The product of the mouse Xist gene is a 15kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Ce l l , 1992. 71: p. 515-526. 63. Nesterova, T .B. , et al., Characterization of the Genomic Xist Locus in Rodents Reveals Conservation of Overall Gene Structure and Tandem Repeats but Rapid Evolution of Unique Sequence. Genome Research, 2001.11(5): p. 833-849. 64. Wutz, A . , T.P. Rasmussen, and R. Jaenisch, Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nat Genet, 2002. 30(2): p. 167-74. 65. Ganesan, S., et al., BRCA1 supports XIST RNA concentration on the inactive X chromosome. Ce l l , 2002.111(3): p. 393-405. 66. Mak, W. , et al., Mitotically stable association of polycomb group proteins eed and enxl with the inactive Xchromosome in trophoblast stem cells. Curr B i o l , 2002.12(12): p. 1016-20. 67. de Napoles, M . , et al., Polycomb Group Proteins RinglA/B Link Ubiquitylation of Histone H2A to Heritable Gene Silencing andXInactivation. Dev Ce l l , 2004. 7(5): p. 663-76. 68. Fackelmayer, F.O., A stable proteinaceous structure in the territory of inactive X chromosomes. J B i o l Chem., 2005. 280(3): p. 1720-3. 69. Caparros, M . L . , et al., Functional analysis of the highly conserved exon IV of Xist RNA. Cytogenet Genome Res, 2002. 99: p. 99-105. 70. Migeon, B .R. , et al., Identification of TSIX, encoding an RNA antisense to human XIST, reveals differences Jrom its murine counterpart: implications for X inactivation. A m . J . Hum. Genet, 2001. 69: p. 951-960. 71. Shibata, S. and J.T. Lee, Characterization and quantitation of differential Tsix transcripts: implications for Tsix function. Hum. M o l . Genet, 2003.12: p. 125-136. 72. Xue, F. , et al., Aberrant patterns of X chromosome inactivation in bovine clones. Nat Genet, 2002. 31(2): p. 216-20. 73. Brown, C.J . and J.C. Chow, Beyond sense: the role of antisense RNA in controlling Xist expression. Semin Cel l Dev B i o l , 2003.14(6): p. 341-7. 74. Farazmand, A . , et al., Expression of Xist sense and antisense in bovine fetal organs and cell cultures. Chromosome Res, 2004.12(3): p. 275-83. 75. Tinker, A . V . and C.J . Brown, Induction of XIST expression from the human active X chromosome in mouse/human somatic cell hybrids by DNA demethylation. Nuc l . Acids Res., 1998. 26: p. 2935-2940. 76. Heard, E . , et al., Human XIST yeast artificial chromosome transgenes show partial X inactivation center function in mouse embryonic stem cells. Proc. Natl . Acad. Sci . , U S A , 1999. 96: p. 6841-6846. 77. Schwartz, S., et al., PipMaker - a web server for aligning two genomic DNA sequences. Genome Res., 2000.10: p. 577-586. 78. Eddy, S.R., Computational genomics of noncoding RNA genes. Ce l l , 2002.109(2): p. 137-40. 79. Hofacker, I., Vienna RNA secondary structure server. Nuclei Acids res, 2003. 31(13): p. 3429-3431. 80. Zuker, M . , Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 2003. 31(13): p. 3406-15. 81. Touzet, H . , CARNAC: folding families of related RNAs. Nuclei Acids res, 2004. 32: p. 142-145. 116 82. Perriquet, O., Find the common structure shared by 2 homologous RNAs. Bioinformatics, 2003.19(1): p. 108-116. 83. Griffiths-Jones, S., et al., Rfam: annotating non-coding RNAs in complete genomes. Nucleic A c i d Res, 2005. 33: p. 121-124. 84. Karalic, S., Studies ofX-chromosome inactivation and the identification of the Xist gene in the insectivore Scapanus orarius, in Medical Genetics. 2001, U B C : Vancouver, p. 145. 85. Washietl, S. and I.L. Hofacker, Consensus folding of aligned sequences as a new measure for the detection offunctional RNAs by comparative genomics. J . M o l . B i o l . , 2004. 342(1): p. 19-30. 86. Ohhata, T., et al., X-inactivation is stably maintained in mouse embryos deficient for histone methyl transferase G9a. Genesis, 2004. 40(3): p. 151-6. 87. Beletskii, A . , et al., PNA interference mapping demonstrates functional domains in the noncoding RNA Xist. Proc. Natl . Acad. Sci. , U S A , 2001. 98: p. 9215-9220. 88. Rasmussen, T.P., et al., Expression of Xist RNA is sufficient to initiate macrochromatin body formation. Chromosoma, 2001. 110: p. 411-420. 89. Ayl ing , L . J . and D . K . Griffin, The evolution of sex chromosomes. Cytogenet Genome Res, 2002. 99(1-4): p. 125-40. 90. Kohn, M . , et al., Wide genome comparisons reveal the origins of the human X chromosome. Trends Genet, 2004. 20(12): p. 598-603. 91. Nanda, I., et al., 300 million years of conserved synteny between chicken Z and human chromosome 9. Nat Genet, 1999. 21(3): p. 258-9. 92. Nanda, I., et al., Conserved synteny between the chicken Z sex chromosome and human chromosome 9 includes the male regulatory gene DMRTP. a comparative re(view) on avian sex determination. Cytogenetics and Cel l Genetics, 2000. 89: p. 67-78. 93. Pask, A . and J . M . Graves, Sex chromosomes and sex-determining genes: insights from marsupials and monotremes. Cellular and Molecular Life Sciences, 1999. 55: p. 864-875. 94. Lahn, B.T. , N . M . Pearson, and K . Jegalian, The human Y chromosome, in the light of evolution. Nat Rev Genet, 2001. 2(3): p. 207-16. 95. Lahn, B .T . and D . C . Page, Four evolutionary strata on the human X chromosome. Science, 1999. 286: p. 964-967. 96. Wilcox, S.A., et al., Comparative mapping identifies the fusion point of an ancient mammalian X-autosomal rearrangement. Genomics, 1996. 35(1): p. 66-70. 97. Graves, J .A. , M . J . Wakefield, and R. Toder, The origin and evolution of the pseudoautosomal regions of human sex chromosomes. Hum M o l Genet, 1998. 7(13): p. 1991-6. 98. Carrel, L . , et al., A first-generation X-inactivation profile of the human X chromosome. Proc. Natl. Acad. Sci. , U S A , 1999. 96: p. 14440-14444. 99. Carrel, L . and H.F . Willard, X-inactivation profile reveals extensive variability in X-linkedgene expression in females. Nature, 2005. 434(17): p. 400-404. 100. Jegalian, K . and D . C . Page, A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature, 1998. 394: p. 776-780. 101. Craig, I.W., et al., Application of microarrays to the analysis of the inactivation status of human X-linked genes expressed in lymphocytes. Eur J H u m Genet, 2004. 12(8): p. 639-46. 117 102. Coates, J. W. , S . M . Schmutz, and C G . Rousseaux, A survey of malformed aborted bovine fetuses, stillbirths, and non-viable neonates for abnormal karyotypes. Can. J. Vet. Res., 1988. 52: p. 258-63. 103. Tsuchiya, K . D . and H.F . Willard, Chromosomal domains and escape from X inactivation: comparative X inactivation analysis in mouse and human. Mamm. Genome, 2000.11: p. 849-854. 104. Luoh, S. W. , et al., CpG islands in human ZFX and Zfy and mouse Zfx genes: sequence similarities and methylation differences. Genomics, 1995. 29: p. 353-363. 105. Ross, M . T . , et al., The DNA sequence of the human X chromosome. Nature, 2005. 434(7031): p. 325-37. 106. Tsuchiya, K . D . , et al., Comparative sequence and X-inactivation analyses of a domain of escape in human Xpl 1.2 and the conserved segment in mouse. Genome Res, 2004. 14(7): p. 1275-84. 107. Filippova, G . N . , et al., Boundaries between chromosomal domains of X inactivation and escape bind CTCF and lack CpG methylation during early development. Dev. Ce l l , 2005. 8: p. 31-42. 108. Carrel, L . , P . A . Hunt, and H.F . Willard, Tissue and lineage-specific variation in inactive X chromosome expression of the murine Smcx gene. Hum. M o l . Genet., 1996. 5: p. 1361-1366. 109. Sheardown, S., et al., The mouse Smcx gene exhibits developmental and tissue specific variation in degree of escape from X inactivation. Hum. M o l . Genet., 1996. 5: p. 1355-1360. 110. Hansen, R.S. , et al., Escape from gene silencing in ICF syndrome: evidence for advanced replication time as a major determinant. Hum M o l Genet, 2000. 9(18): p. 2575-87. 111. Anderson, C . L . , and Brown, C.J . , Epigeneticpredisposition to expression ofTIMPl from the human inactive Xchromosome. 2005, Submitted. 112. Rougeulle, C , P. Navarro, and P. Avner, Promoter-restricted H3 Lys 4 di-methylation is an epigenetic mark for monoallelic expression. Hum M o l Genet, 2003. 12(24): p. 3343-8. 113. Migeon, B .R. , Xchromosome inactivation: theme and variations. Cytogenet. Genome Res., 2002. 99: p. 8-16. 114. Lingenfelter, P. A . , et al., Escape from X inactivation of Smcx is preceded by silencing during mouse development. Nat. Genet., 1998.18: p. 212-213. 115. Ohta, T., An examination of the generation-time effect on molecular evolution. P N A S , 1993. 90(22): p. 10676-80. 116. Douzery, E . , J .D. Lebreton, and F . M . Catzeflis, Testing the generation time hypothesis using DNAJDNA hybridization between artiodactyls. J. evol. B i o l . , 1995. 8: p. 511-29. 117. Kumar, S. and S. Subramanian, Mutation rates in mammalian genomes. Proc Natl Acad Sci U S A , 2002. 99(2): p. 803-8. 118. Jorgensen, F . G . , et al., Comparative analysis ofprotein coding sequences from human, mouse and the domesticated pig. B M C B i o l . , 2005. 3(1): p. 2-17. 119. Duthie, S . M . , et al., Xist RNA exhibits a banded localization on the inactive X chromosome and is excludedfrom autosomal material in cis. Hum. M o l . Genet., 1999. 8: p. 195-204. 120. Dobigny, G . , et al., Viability ofX-autosome translocations in mammals: an epigenomic hypothesis from a rodent case-study. Chromosoma, 2004.113(1): p. 34-41. 118 121. Iannuzzi, L . , et al., Comparative FISH mapping of bovidX chromosomes reveals homologies and divergences between the subfamilies bovinae and caprinae. Cytogenet Ce l l Genet, 2000. 89(3-4): p. 171-6. 122. Sharp, A . , D .O. Robinson, and P . A . Jacobs, Absence of correlation between late-replication and spreading of X inactivation in an X;autosome translocation. Hum. Genet, 2001.109: p. 295-302. 123. Chowdhary, B.P. , et al., Emerging patterns of comparative genome organization in some mammalian species as revealed by Zoo-FISH. Genome Res., 1998. 8(6): p. 577-89. 124. Kuroiwa, A . , et al., Comparative FISH mapping of mouse and rat homologues of twenty-five human X-linked genes. Cytogenet Ce l l Genet., 1998. 81(3-4): p. 208-12. 125. Robinson, T.J., et al., A molecular cytogenetic analysis of Xchromosome repatterning in the Bovidae: transpositions, inversions, and phylogenetic inference. Cytogenet Ce l l Genet, 1998. 80(1-4): p. 179-84. 126. Piumi, F., et al., Comparative cytogenetic mapping reveals chromosome rearrangements between the X chromosomes of two closely related mammalian species (cattle and goats). Cytogenet Cel l Genet, 1998. 81(1): p. 36-41. 127. Raudsepp, T., et al., Exceptional conservation of horse-human gene order on X chromosome revealed by high-resolution radiation hybrid mapping. Proc Natl Acad Sci . , 2004.101(8): p. 2386-91. 128. Itoh, T., et al., A comprehensive radiation hybrid map of the bovine genome comprising 5593 loci. Genomics, 2005. 85(4): p. 413-24. 129. Dixkens, C , et al., ZOO-FISH analysis in insectivores: "Evolution extols the virtue of the status quo". Cytogenet Cel l Genet., 1998. 80(1-4): p. 61-67. 130. Svartman, M . , et al., A chromosome painting test of the basal eutherian karyotype. Chromosome Res., 2004.12(1): p. 45-53. 131. Gorman, M . L . and R . D . Stone, The natural history of moles. 1990, N e w York: Comstock Publishing, Cornell University Press. 132. Yates, T .L . , A . D . Stock, and D.J . Schmidly, Chomosome banding patterns and the nucleolar organizer region of the Eastern Mole (Scalopus aquaticus). Experientia, 1976. 11(10): p. 1276-77. 133. Yates, T .L . and D.J . Schmidly, Karyotype of the eastern mole (Scalopus squaticus), with comments on the karyology of the family Talpidae. Journal of Mammology, 1976. 56(4): p. 902-05. 134. Poloumienko, A . , Cloning and comparative analysis of the bovine, porcine, and equine sex chromosome genes ZFX andZFY. Genome, 2004. 47(1): p. 74-83. 135. Ryu, S., et al., The transcriptional cofactor complex CRSP is requiredfor activity of the enhancer-binding protein Spl. Nature, 1999. 397(6718): p. 446-50. 136. Mitchell , M . J . , et al., The origin and loss of the ubiquitin activating enzyme gene on the mammalian Y chromosome. Hum M o l Genet, 1998. 7(3): p. 429-34. 137. X u , J., P.S. Burgoyne, and A . P . Arnold, Sex differences in sex chromosome gene expression in mouse brain. Hum M o l Genet, 2002.11(12): p. 1409-19. 138. Jensen, L . , et al., Mutations in the JARID1C gene, which is involved in transcriptional regulation and chromatin remodeling, cause X-linked mental retardation. A m J Hum Genet, 2005. 76(2): p. 227-36. 139. Basrur, P .K . , et al., Expression pattern of X-linked genes in sex chromosome aneuploid bovine cells. Chromosome Res, 2004.12(3): p. 263-73. 119 140. Komura, J., et al., In vivo ultraviolet and dimethyl sulfate footprinting ofthe 5' region of the expressed and silent Xist alleles. Journal of Biological Chemistry, 1996. 272(16): p. 10975-10980. 141. Chomczynski, P., and Sacchi, N., Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem., 1987.162: p. 156-159. 120 Appendix Figure A.1. Coast Mole Extended Exon 4 in Xist. Dotplot of coast mole Xist cDNA on X axis, and cow Xist genomic sequence on Y axis, is depicted. Red arrow indicates region in mole sequence that aligns to intronic sequence in cow, between cow exon 4 and exon 5, representing an extended coast mole exon 4. Folding this elongated exon in mole generates a stable hairpin structure via Mfold (B), that is conserved in other eutherians. 121 M o u s e Xist g e n o m i c D N A underlay I 'i • vole I _| g rat j I Q mole I ] 6D COW I I dog I I human I I 1 1 1 1 1 1 5k 10k 15k 2 0 k 2 1 5 3 0 M o u s e Xist c D N A < underlay vole |T rat C mole LZ Q w cow dog human - | 1 1 1 1 1 2k 4k 6k 8k 10k 11646 H u m a n Xist g e n o m i c D N A underlay I vole I mouse ^ mole C g cow C WD dog C -at I 1 1 1 1 1 1 1 1 5k 10k 15k 2 0 k 25k 30k 3 2 0 7 9 H u m a n Xist c D N A underlay <^ mole C Z dog | s cow r_ mouse | vole C rat f 2k 4k 6k 8k 10k 12k 1 4 1 9 2 F i g u r e A .2. O v e r v i e w o f P e r c e n t I d e n t i t y P l o t s o f E u t h e r i a n Xist S e q u e n c e s . Regions o f >50% sequence similarity to the either the human or mouse XIST/Xist sequence (labeled at the top) are shown in green, while regions o f >75% sequence similarity are shown in pink. The repeat regions o f the indicated sequence (labeled at the top) are shown in red for A repeat, grey for F repeat, green for B repeat, blue for C repeat, yel low for D repeat, and purple for E repeat. Wi th the exception o f mole, genomic Xist sequences from eutherians were aligned to human or mouse genomic XIST/Xist sequences; eutherian Xist c D N A sequences were aligned to human or mouse XIST/Xist c D N A using Mult iPipmaker [77]. 122 Mouse Xist genomic DNA ole sf^V rat mole cow dog human i • 23 4 5 6 II I I I ^ ^ ^ ^ ^ L »—r-^  — i 1 , H r 1 5 - ! 1 i 1 1 1 — i • r — = n — • f ' Ok 2k 4k 6k Sk IQ* iZk Human Xist genomic DNA 20k vole mole rat dog mole cow cow mouse dog vole human rat S—' i • " i • i 1 1 1—"i r — 1 1 — — 1 — 2153.0 n 1 mole dog . - • • -••> cow mouse vole -rat - i "I' 1 i 1 r^—l 1 "i 1 <( Figure A.3. Percent Identity Plots of Human and Mouse XIST/Xist Genomic Sequences. The mouse Xist genomic sequence is shown at the top, with exons labeled 1-7 in black boxes. The human XIST genomic sequence is shown at the bottom, with exons labeled 1-6 in black boxes. Percent identities o f the six eutherian Afar orthologs relative to the mouse and human sequence are shown, as generated by Mul t iP ipMaker [77], are shown. 123 A ratA/1-510 KCaTCWMCACGC-ATACCTG 7TGTCCTCC CADCCXT TCC* TnCCAGCTGGGCITGGGATAinTW CTOTTTTAATanTlTI-lomeA/1-510 m t t T C E W C C A C G G A T A C C T C 3TGTCCTCC—-CCKCAT TCCi WCCiWIKGOTnGGATACTTA- CTKCTTTOriUTTlTl-mli/25-510 Uim "CCCCATEGGGGCTACAGTrGAATG 3TGTCCTCCTTCCCTJKCCCTTCACTCC/ WCACTGGGGCTCrGGAAACTTA- iCTGCTnTMlTCU'iTlljl 11--dogl/1-510 iKCCAICCCCGCTCCGGATACCGC nTTATTAT TATTATTA—TTA1 TJCCCAACGGGGCTGTGGATA CTOCTTTTAATI COTA/1-510 1 | ACCCAACGGGGTCATGGATA CTKCTmiTTTTAlTmil--UIMA/19-SIO TATT) raUTXIKGGCTECGGATACCTG- ITITATr ATTTTT----TCT1 TjCCCAACGGGGCCGTGGATA I C T K n T T r a i l l l l l l l l -Qmlicr/l-51i E»tA/l-S10 I C-BCaCBAA— GCCCATCTO7GTT5TGGATACCTGC TTTATlLl'l 1 i'lI'll! II! rCTCCT-lOlueA/l-SlOT T-TT--TCTAAAITI KCCATIHTOCT5TOAIACCTGC mATigHtBlJ—Tjfc-vole/25-510 l lUl|l|JUAACTI GCCCATTreGGCTGAGGATACCTGC CUI! 11 111! 1ICCTT ICCm GCCCATCGGGGCCTCGGATACCTGC GTGCCCCCCCCCCCCAAC—TCCC-TCCCrn GCCCATCGGGGCCTtGGATACCTGC GTCTACCC-CCTCTCICCCTAACC-- T A i T r a n m c n G C C C A T C G O C G T A T C G G A T A C C T G C GAnrccmcccrcTBAAccca —t GCCCATCGGGGCCATGGATACCTGC rTTUCCAAAAAACCCCCtA -T. XCCATCGGCGCUWGATACCTGC mTGTJ - jCCCATXGGG-TAA'TCGATACCTGC ITIT-: A T C G G G G C A A T W A T A C C T G C I ,!~n-L^TCGGGGCAATOJATACCTGCI IVnTITlAAAWGITOT Tm GCCCATCGGGGCTTCGGATACCTl men GCCCATCGGGGCCICGGATACCTt; ICATCGGGGCTTCGC QlMllCJ/ l -SH 321) 330 3 « 350 360 370 400 41Q 420 430 term TOCATxtawcreriwATACCT rmummTm'-cic-ntA/l-S10 JATACCTGC nTAGI--TnT--TKCC, "GCCCAACGGGGCCTCGGATACCTGC -TTA--A' louseA/1-Sll jATACCTGC il'liSiriTITI-TrCCC. "GCCCAACGGGGCCTCGGATACCTGC GTTATTAl'mTl'ITlLI I ITIll II "GCCCATCGGGGCTGTGGATACCT mTO^KM-TW volt/25-510 3AHCCTK --TAI?ITTJUrrC--TTnr raCAAC&GGGCCTTGGATACCTGC -TA-Al'mTl'mTllLiLril ' l 1KCCACCG&GGCCGTGGATACCT mCATTTTTTTCCK iii)gA/i-5io pJiCCTGc nTAui ' i i i ' i ' i -mcr racATCGGGGCcrcGGATACciGc T m - i m n i i i c c- i wccATXGiranmGATACini iTTAGAi'mi'MHOTi cwA/1-510 JATACCTGC TTAAin'lUT!—TT--C* OTIUTCGGOGCCTCGGATACCTGC -TO-ilTU'n'l'K C-1 WCCAICIKiGGCCKiaATACCTi nTAGAIilllbl'lll-ACAC niaA/19-S10 JATACCTGC UllllllllllATTTTCr r&CCCATCGGGGCCTCGGATACCTGC -TTA-AlUHWIl—-TTC— •GCCaTCOGCCCCGCGOATACCTl nTTGAl 111 I'll tl-TCAT 450 GCCCATCGC-OTATTTGGT&GATI JAAATAATGCTTTT GCCCAAC5G&GC-TTGGTGGATI IAAAT — GCCCAT^GKGCATTT&GTGGATt rATATAAGT GCCCATCGGGGCCTTTTATGGATt iUlrlVl'lkn" 1-ACCCATCGG68C-B GCCCATCIKTGCTrTTTATGGA'n AAAAAATGTTGG-Qualltr/1-5 B 10 20 30 40 50 60 70 i huaT/1-722 -mmmmmmr-- -wm—mmm-(.oleF/51-722 GTCCAAA TCCCGGATT ATTTTTTTCCGCAGTGTCCAA TGGCGGGAA KUJ^axmZGTaTnnTttiti dogF/1-722 TCGCGGACC iAATITTG-CCGCAGTGTTCAt TGGCGGGAA SCI cavr/323-722 GTTCAAA TGGCGGATC ATTTT&--CI MUMF/40-722CTATJ .TTCTGTTACATGCTAAAA AT&GCGGCT JAGCACTTCTG— A C A T A G T A A A A ATGGCGGCT 7AGTACTTGCCGCAG1 LTAGTAAAA ATGCCCGCT UCTACTTGCCGCACTCTAJ ATGGCGGCT U&CTCTTGCTGCAG—TATAi «GC GGAGG-t T Qu*liCy/l-722 160 170 180 190 200 210 220 230 240 250 260 270 280 290 &U»r/l-722 p H i p H | Mler/51-72I GGCGG-i TTlTtnrrTTOT&TG^^ TGGCAGG— 34GCCTCGGG-GATAGTATGGCA&GCCTTArTATCTGGAACT TTGOCAATT JTCACCTOGATCTCTrOOCJLGAOCGTCCACASGGOCCTTCCTCTG d o g r / 1 - 7 2 2 GGCGGGi LTflVltTIV-CCGTGTGCATTTI -"TGACAGC7T7 TGCCGCAGG-GACAATATGGCAGACCTTgTCATTTCGATAT' ATGCCACTT 7TCACffTOGACSTCATCOCC O O G A T 1 T I ITiCCCSC— c o w r / 3 2 3 - 7 2 ; CCCGGGi I T I T M ' L I L I « : C C T C T C C A T T T T T C A T A G G T I T O C T G O L C G - G A C A A X A T G G C T G A C C T T 0 T T ^ T C T C G A . T A T ? A T G C C A G 7 T ' . 7 T C A C G T G G A T A T C G T G G C A S G G G T C T T I ' G A C CGT- - -tatT/1-722 GGAGG-i .1111.1 Ul 11CACATGTCA HCTTX .TGGCTGAGT AGTTTAGGGA C AAAGTTjGC AC ATGrnrTTt^GTTTGTCTAA ATGGCGG-T TTC ATGTGATC AGC C CTC AAG B L T J I W 1 ' M O U S C F / 4 0 - 7 ; : A * A G C - i ^TTGTGTGTCACATGTCAGCTTl •TOTCTCAG? *GCCTM - - -A&fcOGTm :ACATt^^ ATtrGCGC-A,3TCATGTGACCTGCCCTCTAG TGGTT-TCTTTCAiiTGATl Figure A.4. M u l t i p l e Al ignments of Xist Repeats A and F . A ) Clusta lW alignments o f repeat A and B ) repeat F . A and F consensus sequences are boxed in red. Nucleotides are colored in green (T), orange (G), blue (A) , and pink (C). 124 Figu re A . 5 . R N A l i f o l d of the Xist 5 ' end. A ) Consensus structure o f the first kilobase of Xist exon. This sequence includes the beginning o f Xist until after the A repeat. B ) Consensus structure of +500 bp to 1.5k b o f Xist. This region includes parts o f the A repeat and extends to include the F repeat. Since this region overlaps with that o f (A) , the analogous structures found in each structure are boxed in green. so P . . A ( V , r V u -A C A ro V O L E u a -R A T B [ voleex«n2 ] C O N S E N S U S 10 i s :o so ii to es [ r<ite2 ] i i 1 1_ _l I I I u io is :o :s 30 35 40 45 so 55 so 75 80 85 SO 95 D 10 20 30 40 50 60 70 80 90 rat/1-99 GGGTGAATCTGGAGTXGGTTTTGTGCCCCTGCCTCAAGAAG GATTGCCTGGATTTAGAGGAGTGAAGAGTGCTGGAGAGTGCTGGTTGACTGAGAG mexZ /1-99 GGATGAATTTGGAGTCTGTnTGTGCTCCTGCC^ vole/1-99 GTATGAATTTGGAGTTGGTTTTGTGCCTC-AATTGAAGAAG ATGGCCTGGTTTTAGAAGAATGGA---TTCTAGACAGCAKCAAAG Quality/1 Figure A . 6 . P a r t i a l Conservat ion of E x o n 2 i n Rodents. C A R N A C gave common structures depicted in (A) and (B) in vole and rat only, while R N A l i f o l d generated the consensus structure (C) when inputting rat, mouse, and vole exon 2 sequences from the alignment shown in (D). 126 VOLE 100 110 ICO 150 HQ 120 130 140 150 160 170 ^ ^ 1 8 0 190 m6/1-199 AfCAAGCAATGTGAACACACAAAAGGMGGCAGCTTTATAMTGACCCGAGGATCMCATGCCTGACTGCAGCATCTTAAMGCIAATAGAATGA-rat/1-199 ATCAAGAAATGTAAACACATGGAAGAACGCCAGCTTTACATACGACCAGAGGATCAACATGTCTGACT-TAGCATCTTAAGGGCAACAGACTGAG vole/1-199 AICGCGAAACGGGAACACATGAAAGGAAGCCAGCTTTATA ACCCAAGGACCAACA-ACATGCCT- --GCATCTTAAA-GCAACAGAATGAA Qualit y / 1 -Figure A.7. Rodents Xist Exon 6 Partially Conserved CARNAC and Consensus Structures. A ) C A R N A C structures conserved in only mouse and vole when non-rodents were excluded. B) Clus ta lW alignment; only part o f the sequences is shown. C ) R N A l i f o l d consensus structure o f rodent exon 6. 127 H U M A N I ? *" L' — T .NX ' V . ML X c o w D O G A H U M A N M O L E D 10 20 30 40 50 60 70 80 90 cex5/l-174 AGAGCTCCTGGTTGTTCCCTTCATATTTGCCAAATCATTATCTTTCCCTGAAGTAGTGCAAAGAGC AAGAAATGTG doae5/l-174 GCTCCTGATTGTTCCCTTTTTATCTACCAAATCATTGTCT- -CCCAAAAGCAGTGCAGAGGGC AAGAAAGTGG hex5/l-174 GCTCTTCATTGTTCC TATCTGCCAAATCATTATACTTCCTACAAGCAGTGCAGAGAGCTGAGTCTTCAGCAGGTCCAAGAAATTTG CM/1-174 TTATTGGTCT TGGGTAAATCATCATTCTGGAGCCTCTGGTC--TGCAACGATCTCC-CTTGTGGTCCTTGGAAAACCTTTG Quality/1-Figure A . 8 . Xist E x o n 5 C A R N A C Results Wi thou t Rodents . Common stems and their corresponding visual structures are shown on the left ( A ) and right (B), respectively, when rodents were excluded from the input. M o l e and human show conserved stems, whereas cow and dog do not. C ) M f o l d structures in each non-rodent eutherian. D ) Clusta lW alignment o f non-rodent exon 5. 128 Figure A . 9 . Rodent versus Non-rodent C o m b i n e d In ternal E x o n Region . A ) C A R N A C structures with rodent input only. B ) C A R N A C structures with non-rodent input only. 129 Table A.1. Multiple Species Xist Splice Junctions. Potential splice sites in dog, rat, and mole are indicated in comparison to experimentally determined splice junctions in cow. Splice sites of mouse, human, and vole can be found in previous work [19-20, 62-63]. Species Mole Rat Dog Cow Accession Number N / A NW_048043.1 (2960970-2986883bp reverse complement, where 1= 2960970) GI:34881475 AAEX01057775 GI: 50088291 (17368-24421bp) + AAEX01057774 (l-25440bp) pieced together l=beginning of pieced sequence AJ421481 GI:21425595 Sequence Type cDNA gDNA gDNA gDNA Size of Xist gDNA/cDNA 14559 incomplete -25914/13732 -32267/15520 43940/21205 Start 1 -4191 -1 116,080 Exon 1- 3' SD 12099 13899 gagtacagtaagtac 13319 taccttggtaagctt 134,811 tactgtaagtact Exon 2 -5' SA 12100 16512 ttttccaggggtgaat 13927 tttaaagggatgaat 136216 tcttaaagggatg Exon 2 -3' SD 12176 16609 gactgagagtctctgccctt 14026 tccaaaggtgaatct 136305 caaaggtgaatctt Exon 3 -5' SA 12177 16714 cttcacaggaacaat 22791 cttctcaaggaaattcc 137781 ttctcaaggacat Exon 3 -3' SD 12304 16853 aaaaaaggtactttg 22939 aaaagatagtttggg 137917 aaggtaatgtaag Exon 4 -5' SA 12305 17604 tttcccccagagtct 24983 tcttctccagatgtt 139377 ttttccagatc Exon 4 -3' SD 12782 17807 aaaataggtaagttt 25723 ttgtcaggtaagact 139588 ccagaggtgg Exon 5 -5' SA 12783 Rodent specific exon 5: 18018 gtttttcctaggacaa Rat exon 6: 18519 tttttttgtagtgccatct 26088 attttttatagctcct 142651 ttttgtagctc Exon 5 -3' SD 12918 Rodent specific exon 5: 18160 tacacaagtgagtag Rat exon 6: 18675 gactgaggtaagtta 26219 gaatgaagtaagttg 142783 gaagtaagt Exon 6 -5' SA 12919 Rat exon 7: 18930 gccatttttacaggcttaaa 27707 ctctcctagatctggct 143828 actgtagttt Exon 6 -3' SD ? Rat exon 7: 21877 ggcttaggtgagcag 22682: aggctcagtaagttg 29272 ctcttgggtgagcgg 29695 tcaaatggtaaatat 148720 taaatgggtaagatg Table A . 2 . Pr imers Used for Sequencing Coast Mole Xist. Primers and their sequences, conditions, and product sizes are listed. P r i m e r P a i r Sequence 5 ' t o 3 ' |MgC12j C y c l i n g Condi t ions # Cycles Product Size C M X I S T 23 C M X I S T R 8 G G T T C T T T C T R G A A C A T T T T C C R G G A T A C T A G A G T A A C T G C A G C G 1.5mM 9 4 - l m i n 4 8 - l m i n 72-3min 35X Mult iple bands C M X I S T 14 C M X I S T 1 1 T T C A T A T G C A C T A A T A A C A A T A G C A C T G C T C A G A A G C A A T G C 1.5mM 94- lmin 54 - lmin 72-2min 35X 1 band, 500bp-lkb C M X 1 S T 1 7 C M X I S T R 1 2 A G C T C A C T A C C A C T G G G C A A C A G C T G C T T G C A G T C C T C A T G T 2 .5mM 94- lmin 52-1 min 72-3min 3 5 X 1 band, l-1.6kb C M X I S T 18 C M X I S T 1 1 G T A T T G T T G C T G A G G A G T G C T A A C A G C A C T G C T C A G A A G C A A T G C 1.5mM 94- lmin 50 - lmin 72-2min 3 5 X 1 band, 500bp-lkb C M X I S T 15 C M X I S T R E V 9 C A G C A G A G G G T A T T T G G G A A G T T C A T T C A T T G T T A A C A T G G C C 1.5mM 94-2min (94-30s 54-30s 68-8min) 68-7min 4 0 X 3-4kb C M X I S T 12 C M X I S T R 6 T T C T C A G M A G T K C T G G C A C A T C T G G A A C A G C A G T T C T T T G T A A T C 1.5mM 94- lmin 52 - lmin 72-3min 35X 1.5-2kb C M X I S T R5 C M X I S T 10 A C T A G G C A A C A A C T C A C T G C C A G G T G G A G T T G A T A A C C T G G 1.5mM (94- lmin 54 - lmin 72-2min) 72-7min 3 OX 600bp C M X I S T 1 1 C M X I S T R 2 A G C A C T G C T C A G A A G C A A T G C C C T T G C C T T T C T C A A G A G G A A C 1.5mM (94-30s 52-30s 68-8min) 68-7min 4 0 X 12kb C M X I S T 1 C M X I S T 2 C A T T G C T G A A G T G G C C T G A G G G T T C C T C T T G A G A A A G G C A A G G 1.5mM (94- lmin 54 - lmin 72-2min) 72-7min 3 OX 200bp Table A . 2 . Pr imers Used for Sequencing Coast Mole Xist (Continued...) C M X I S T R E V 9 C M X I S T R E V 2 A G T T C A T T C A T T G T T A A C A T G G C C C C T T G C C T T T C T C A A G A G G A A C 1.5mM 94-2min (94-3Os 54-30s 68-8min) 68-7min 3 5 X 2 bands, 6kb (faint) or 9kb C M X I S T R E V 2 C M X I S T R E V 1 0 C C T T G C C T T T C T C A A G A G G A A C T C C C A A A T A C C C T C T G C T G 1.5mM 94-2min (94-30s 54-30s 68-8min) 68-7min 3 5 X 5kb C M X I S T 21 C M X I S T 1 2 G C C A A T A T T T A C T T C A A G A T G C C T T C T C A G M A G T K C T G G C A C A T C T G Failed to amplify C M X 1 S T R 1 2 C M X I S T R 6 A G C T G C T T G C A G T C C T C A T G T G A A C A G C A G T T C T T T G T A A T C 1.5mM 94-1 min 54 - lmin or 5 6 - l m i n 72-2min 4 0 X 250bp C M X I S T R 6 C M X I S T 12 G A A C A G C A G T T C T T T G T A A T C T T C T C A G M A G T K C T G G C A C A T C T G 1.5mM 94-1 min 54 - lmin 72-3min 3 5 X 1.5-2kb C M X I S T 21 E6 G C C A A T A T T T A C T T C A A G A T G C C T T G T G T G C T T C A G T G T C T C T G C 1.5mM 94-1 min 54 - lmin 72-3min 3 5 X 400bp C M X I S T 22 E6 G C C A A T A T T T A C T T A C T T C A A G A T G C C T T G T G T G C T T C A G T G T C T C T G C 1.5mM 94-1 min 54 - lmin 72-3min 35X 400bp E 6 R C M X I S T 1 2 G C A G A G A C A C T G A A G C A C A C A A T T C T C A G M A G T K C T G G C A C A T C T G 3.0mM 94-1 min 54 - lmin 72-3min 35X 1605bp C M X I S T R I 5 C M X I S T 2 5 G C T T T A G A G G A A A G G G G A G G A C T G T C T C C C C C T C T T T G T T T C A T A C Failed to amplify C M X I S T 26 C M X I S T R16 A G T C C T C C C C T T T C C T C T A A A G C C C A R W G C M R H A R A M A C A C A H T G G C C 3 m M 94-1 min 52-1 min 72-3min 4 0 X 2 bands, 450bp or lkb Table A . 2 . Pr imers Used for Sequencing Coast Mole Xist (Continued.. .) C M X I S T 1 5 C M X I S T R17 C A G C A G A G G G T A T T T G G G A A A T G G G A A G G C A A A G A T G G G 1.5mM (94-30s 52.6-30s 68-5min) 68-7min 3 5 X 3-4kb C M X I S T RI 7 C M X I S T 27 A A T G G G A A G G C A A A G A T G G G G A A G G A A A A G T A G G A G G G G T G G Failed to amplify C M X I S T R E V 1 0 C M X I S T R E V 3 T C C C A A A T A C C C T C T G C T G A A G G C C A A T T A A T G A G T T C A 1.5mM 94-2min (94-30s 54-30s 68-8min) 68-7min 3 5 X 5kb C M X I S T 19 C M X I S T R E V 11 G A A G A K G G Y W C T A A C C T Y A A K G T A C T R G A A A A T G T T C Y A G A A A G A A C C 1.5mM 9 4 - l m i n 52-1 min 72-2min 3 OX 900bp M o l e 600bp Human, C o w C M X I S T 12 C M X I S T 1 3 T T C T C A G M A G T K C T G G C A C A T C T G T T C T T T T G A G A T G T M C T T T T T G A T G T T 1.5mM 9 4 - l m i n 52-1 min 72-3min (Human) O R 9 4 - l m i n 49-1 min 72-3min (Cow) 3 5 X ~400bp Human ~300bp C o w Failed in M o l e Table A.3. Pairwise Identities of Exons in Eutherians. Human exon 2 is unique in sequence, as reported previously. The lengths of each sequence are indicated. The conservation scores between the first sequence (Seq A ) and the second sequence (Seq B) are given in the last column. CM=coast mole, h=human, m=mouse, c=cow; "ex" or "e" = exon. A ) Pairwise Identities of Xist Exon 2. S e q A Name L e n g t h s e q B Name L e n g t h S c o r e 1 CM 77 2 vo le 83 1 CM 77 3 hex2 64 1 CM 77 4 mex2 91 1 CM 77 5 cex2 90 1 CM 77 6 dog 97 1 CM 77 7 ra t 96 2 v o l e 83 3 hex2 64 2 v o l e 83 4 mex2 91 2 v o l e 83 5 cex2 90 2 v o l e 83 6 dog 97 2 v o l e 83 7 ra t 96 3 hex2 64 4 mex2 91 3 hex2 64 5 cex2 90 3 hex2 64 6 dog 97 3 hex2 64 7 ra t 96 4 mex2 91 5 cex2 90 4 mex2 91 6 dog 97 4 mex2 91 7 ra t 96 5 cex2 90 6 dog 97 5 cex2 90 7 ra t 96 6 dog 97 7 ra t 96 64 12 48 80 83 44 14 80 66 63 60 21 14 15 10 70 64 83 85 61 58 S e q A Name 1 hex3 1 hex3 1 hex3 1 hex3 1 hex3 1 hex3 2 cex3 2 cex3 2 cex3 2 cex3 2 cex3 3 doge3 3 doge3 3 doge3 3 doge3 4 CM 4 CM 4 CM 5 ra t 5 ra t 6 mex3 L e n ( n t ) S e q B Name L e n ( n t ) Score 137 2 cex3 138 78 137 3 doge3 141 72 137 4 CM 129 67 137 5 ra t 138 35 137 6 mex3 132 54 137 7 vo le 138 43 138 3 doge3 141 78 138 4 CM 129 74 138 5 r a t 138 30 138 6 mex3 132 58 138 7 vo le 138 60 141 4 CM 129 72 141 5 ra t 138 12 141 6 mex3 132 15 141 7 vo le 138 54 129 5 ra t 138 32 129 6 mex3 132 55 129 7 vo le 138 59 138 6 mex3 132 81 138 7 vo le 138 55 132 7 vo le 138 65 134 Table A.3. Pairwise Identities of Exons in Eutherians (Continued...) SeqA Name Len(nt) SeqB Name Len(nt ) Score 1 mole 474 2 vo le 213 52 1 mole 474 3 hex4 209 69 1 mole 474 4 mex4 211 52 1 mole 474 5 cex4 208 70 1 mole 474 6 dog 215 67 1 mole 474 7 ra t 131 77 2 v o l e 213 3 hex4 209 77 2 v o l e 213 4 mex4 211 87 2 v o l e 213 5 cex4 208 73 2 v o l e 213 6 dog 215 73 2 v o l e 213 7 ra t 131 86 3 hex4 209 4 mex4 211 77 3 hex4 209 5 cex4 208 87 3 hex4 209 6 dog 215 89 3 hex4 209 7 ra t 131 78 4 mex4 211 5 cex4 208 73 4 mex4 211 6 dog 215 74 4 mex4 211 7 ra t 131 96 5 cex4 208 6 dog 215 86 5 cex4 208 7 ra t 131 77 6 dog 215 7 ra t 131 80 SeqA Name Len(nt ) SeqB Name Len(nt ) Score SeqA Name Len(nt) SeqB Name ten (n t ) Score 1 cow 133 2 dog mole 132 81 1 cow 133 3 138 75 1 cow 133 4 human 164 78 1 cow 133 5 mouse 197 27 1 cow 133 6 ra t 155 21 1 cow 133 7 vo le 134 23 2 dog 132 3 mole 138 76 2 dog 132 4 human 164 78 2 dog 132 5 mouse 197 29 2 dog 132 6 ra t 155 31 2 dog mole 132 7 v o l e 134 60 3 138 4 human 164 74 3 mole 138 5 mouse 197 40 3 mole 138 6 ra t 155 38 3 mole 138 7 vo le 134 36 4 human 164 5 mouse 197 67 4 human 164 6 ra t 155 71 4 human 164 7 vo le 134 44 5 mouse 197 6 ra t 155 83 5 mouse 197 7 vo le 134 64 6 ra t 155 7 vo le 134 61 D) Pairwise Identities of Xist Unique Rodent Exon 5. SeqA Name Len(nt) SeqB Name Len(nt) score 1 v o l e 103 2 mex5 147 33 1 v o l e 103 3 ra t5 131 49 2 mex5 147 3 ra t5 131 84 135 Table A.3. Pairwise Identities of Exons in Eutherians (Continued...) E) Pairwise Identities of Xist Internal Exon Region (Exons 2-5 for Non-Rodents SeqA Name t e n ( n t ) SeqB Name L e n ( n t ) S c o r e 1 v o l e 671 2 mouse 989 52 1 v o l e 671 3 mole 533 47 1 v o l e 671 4 human 574 41 1 v o l e 671 5 cow 568 53 1 v o l e 671 6 r a t 651 72 1 v o l e 671 7 dog mole 585 53 2 mouse 989 3 533 45 2 mouse 989 4 human 574 41 2 mouse 989 5 cow 568 54 2 mouse 989 6 r a t 651 48 2 mouse 989 7 dog 585 52 3 mole 533 4 human 574 43 3 mole 533 5 cow 568 58 3 mole 533 6 r a t 651 33 3 mole 533 7 dog 585 57 4 human 574 5 cow 568 69 4 human 574 6 r a t 651 47 4 human 574 7 dog 585 67 5 cow 568 6 r a t 651 42 5 cow 568 7 dog 585 84 6 r a t 651 7 dog 585 42 136 Table A .4 . Pairwise Identities for Xis t Sequences Before the A Repeat. SeqA Name Len(nt) SeqB Name Len(nt ) Score 1 Hum 284 2 Rat 325 56 1 Hum 284 3 Cow 319 74 1 Hum 284 4 Mouse 319 50 1 Hum 284 5 Dog v o l e 242 77 1 Hum 284 6 290 59 2 Rat 325 3 Cow 319 46 2 Rat 325 4 Mouse 319 54 2 Rat 325 5 Dog v o l e 242 60 2 Rat 325 6 290 55 3 Cow 319 4 Mouse 319 58 3 Cow 319 5 Dog v o l e 242 80 3 Cow 319 6 290 51 4 Mouse 319 5 Dog v o l e 242 35 4 Mouse 319 6 290 77 5 Dog 242 6 v o l e 290 38 137 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items